Enhancing Data Interoperability and Quality with Data Formatter and Mapper

SEDIMARK · June 28, 2024

In the modern era of big data, the challenge of integrating and analyzing data from various sources has become increasingly complex. Different data providers often use diverse formats and structures, leading to significant challenges in achieving data interoperability. This complexity necessitates robust mechanisms to convert and harmonize data, ensuring they can be effectively used for analysis and decision-making. SEDIMARK has identified two critical components in this process and is actively working on them: data formatter and data mapper.

A data formatter is designed to convert data from various providers, each using different formats, into the NGSI-LD standardized format. This standardization is crucial because it allows data from disparate sources to be compared, combined, and analyzed in a consistent manner. Without a data formatter, the heterogeneity of data formats would pose a significant barrier to interoperability. For example, data from providers might be in XLSX format, another in JSON, and yet another in CSV. A data formatter processes these different formats, transforming them into a unified format that can be easily managed and analyzed by SEDIMARK tools.

A data mapper comes into play after data processing to store the data and maps it to a specific data model. This process involves not only aligning the data with the model but also enriching it with quality metrics and metadata. During this stage, the data mapper adds valuable information about data quality obtained during the data processing step, such as identifying outliers and their corresponding anomaly scores, and missing and redundant data identification. This enriched data model becomes a powerful asset for future analyses, giving a complete picture of the data.By converting various data formats into a standard format and then mapping and enriching the data, SEDIMARK achieves a higher level of data integration. This process ensures that data from multiple sources can be used together seamlessly, facilitating more accurate and comprehensive analyses. Moreover, the inclusion of data quality metrics during the mapping process adds a layer of reliability and trustworthiness to the data. Information about outliers, missing data, and redundancy is crucial for data scientists and analysts, as it allows them to make informed decisions and apply appropriate processing techniques.

Subscribe to SEDIMARK!

SEDIMARK Follow

SEcure Decentralised Intelligent Data MARKetplace. A #horizoneurope project funded by the European Union.

Avatar SEDIMARK @sedimark ·

17h

🎉 Excited to share that Tarek Elsaleh will represent @sedimark at OpenSource Community Day 2025 in Madrid (23–24 Sept)! He’ll speak on how EU projects like SEDIMARK are advancing open data & innovation. 🇪🇺

📍 Don’t miss this key #opensource event!

Reply on Twitter 1950226999898726708 Retweet on Twitter 1950226999898726708 Like on Twitter 1950226999898726708 Twitter 1950226999898726708

Avatar SEDIMARK @sedimark ·

30 Jun

🚀 Just Published: A Practical Guide to Multivariate Time Series Forecasting with Crossformer Package.
This tool is useful for forecasting tasks in domains like energy or sensor networks—where handling multiple correlated signals is essential.

Reply on Twitter 1939720082130682096 Retweet on Twitter 1939720082130682096 Like on Twitter 1939720082130682096 1 Twitter 1939720082130682096

Avatar SEDIMARK @sedimark ·

25 Jun

Do not miss our AI workflow for time series forecasting tutorial! 👇

Reply on Twitter 1937869357326983477 Retweet on Twitter 1937869357326983477 Like on Twitter 1937869357326983477 2 Twitter 1937869357326983477