SEDIMARK D3.3 “Enabling tools for data interoperability, distributed data storage and training distributed AI models” is a report produced by the SEDIMARK project. It aims at providing insights in relation to progress made for:
Data quality is considered to be of the highest importance for companies to improve their decision-making systems and the efficiency of their products. In this current data-driven era, it is important to understand the effect that “dirty” or low-quality data (i.e., data that is inaccurate, incomplete, inconsistent, or contains errors) can have on a business. Manual data cleaning is the common way to process data, accounting for more than 50% of the time of knowledge workers. SEDIMARK acknowledges the importance of data quality for both sharing and using/analysing data to extract knowledge and information for decision-making processes.
Common types of dirty or low-quality data include:
Thus, one of the main work items of SEDIMARK is to develop a usable data processing pipeline that assesses and improves the quality of data generated and shared by the SEDIMARK data providers.
This document, along with all the public deliverables and documents produced by SEDIMARK, can be found in the Publications & Resources section.