SEDIMARK Logo

In today’s data-driven world, finding relevant datasets is crucial for researchers, data scientists and businesses. This has led to the development of dataset recommendation systems. Similarly, as the movie recommendation system used by Netflix guiding users to discover the most relevant movies, the dataset recommender system aims to guide users to navigate this complex landscape of dataset discovery efficiently. 

Recommender systems learn to analyse user behaviour to make intelligent suggestions. This can be achieved with a variety of techniques, ranging from content-based filtering approaches that focus on the item descriptions and recommend similar items to those that the user has previously interacted with; through collaborative filtering approaches that recommend items based on interactions of similar users; to hybrid approaches combining the content and collaborative filtering. 

High-quality recommender systems provide many benefits to users such as efficiency by automating the search for relevant items, personalisation improving user satisfaction and enhanced discovery exposing users to items they may not be aware of.

It is clear that an efficient recommender system has many advantages in various domains and dataset recommendation is no different. However, with the exponential growth of data, often residing at various locations, finding the right dataset for a specific task or project has become increasingly challenging [1]. Moreover, numerous datasets lack high-quality descriptions making the discovery even harder [2]. This is particularly important for content-based recommender systems as they rely on high-quality metadata. Therefore insufficient dataset metadata information brings challenges associated with effective dataset recommendations, as high-quality recommendations rely on high-quality metadata information. 

In SEDIMARK, we aim to address the challenge of poor quality metadata in dataset recommendation with the development of novel techniques for dataset metadata enrichment. With automatic and efficient metadata enrichment, SEDIMARK can improve the overall user experience and dataset discoverability and drive better decision-making for the future.

[1] Chapman, Adriane, et al. "Dataset search: a survey." The VLDB Journal 29.1 (2020): 251-272.

[2] Reis, Juan Ribeiro, Flavia Bernadini, and Jose Viterbo. "A new approach for assessing metadata completeness in open data portals." International Journal of Electronic Government Research (IJEGR) 18.1 (2022): 1-20.

Have you ever wondered how a smart city manages to keep everything from urban planning to environmental monitoring running smoothly? The answer lies in something called Spatial Data Infrastructure (SDI). While it might sound technical, SDI framework plays a crucial role in making geographic information accessible and integrated, benefiting everyone.

Imagine a world where data about locations – from urban planning maps to environmental monitoring systems – is at your fingertips. SDI turns this vision into reality. By connecting data, technology, and people, SDI helps improve decision-making and efficiency in numerous areas of our lives.

Smart City: SEDIMARK Helsinki Pilot and Spatial Data

The SEDIMARK Helsinki pilot aims to demonstrate how Digital Twin technology can revolutionize urban mobility with spatial data as the backbone. SEDIMARK's context broker (NGSI-LD) handles linked data, property graphs, and semantics using three main constructs: Entities, Properties, and Relationships. This integration opens up opportunities for new services and the development of a functional city, aiming to enhance geospatial data integration within urban digital twins. In Helsinki, the approach focuses on transitioning from a monolithic architecture to a modular, API-driven approach, developing Digital Twin viewers and tools, and collaborating on a city-wide Geospatial Data.

Join us on this journey as we dive into the world of Spatial Data Infrastructure and see how it's making our city smarter, more efficient, and better prepared for the future.

Photo credit. https://materialbank.myhelsinki.fi/images/attraction?sort=popularity&openMediaId=6614

When we think of data, especially from diverse traffic sources, beauty isn't typically the first thing that comes to mind. Instead, we imagine numbers, graphs, and charts, all designed to convey information quickly and efficiently. However, what if we could see data not just as a tool for analysis, but as a source of inspiration, capable of producing visuals as captivating as a masterpiece by Vincent van Gogh? Just like van Gogh's "Starry Night" finds beauty in complexity and chaos, we can render data into beautiful, meaningful visualizations.

The Complexity of Traffic Data

Traffic data is inherently complex. It comes from a variety of sources of interoperable systems and devices. Each source provides a different perspective, capturing the flow of vehicles, the density of traffic, and the speed of travel at any given time. When combined, these data points create a comprehensive picture of urban movement.

From Chaos to Clarity

Much like the seemingly chaotic yet harmonious art, raw traffic data can appear overwhelming. However, through careful visualization and simulation, patterns and insights emerge. Advanced algorithms process the data, identifying trends and correlations that aren't immediately apparent. For instance, heat maps can show areas of high congestion, while flow diagrams can illustrate the movement of vehicles through a city over time.

The beauty of data

Data visualization is an art form in its own right. The choice of colors, shapes, and lines can transform a simple graph into a work of art. For example, a time-lapse visualization of traffic flow can resemble the dynamic motion in an urban city with streams of vehicles.

Helsinki mobility digital twin

Helsinki mobility digital twin paves the way for a future where cities leverage data. This data-driven revolution, fueled by powerful data visualization, holds immense potential for creating a more efficient, sustainable, and safer urban transportation landscape.

So, can traffic data be beautiful? Absolutely. All it takes is the right perspective and a touch of creativity to turn numbers into a work of art.

Photo credit: Kuva.

In the modern era of big data, the challenge of integrating and analyzing data from various sources has become increasingly complex. Different data providers often use diverse formats and structures, leading to significant challenges in achieving data interoperability. This complexity necessitates robust mechanisms to convert and harmonize data, ensuring they can be effectively used for analysis and decision-making. SEDIMARK has identified two critical components in this process and is actively working on them: data formatter and data mapper.

A data formatter is designed to convert data from various providers, each using different formats, into the NGSI-LD standardized format. This standardization is crucial because it allows data from disparate sources to be compared, combined, and analyzed in a consistent manner. Without a data formatter, the heterogeneity of data formats would pose a significant barrier to interoperability. For example, data from providers might be in XLSX format, another in JSON, and yet another in CSV. A data formatter processes these different formats, transforming them into a unified format that can be easily managed and analyzed by SEDIMARK tools.

A data mapper comes into play after data processing to store the data and maps it to a specific data model. This process involves not only aligning the data with the model but also enriching it with quality metrics and metadata. During this stage, the data mapper adds valuable information about data quality obtained during the data processing step, such as identifying outliers and their corresponding anomaly scores, and missing and redundant data identification. This enriched data model becomes a powerful asset for future analyses, giving a complete picture of the data.By converting various data formats into a standard format and then mapping and enriching the data, SEDIMARK achieves a higher level of data integration. This process ensures that data from multiple sources can be used together seamlessly, facilitating more accurate and comprehensive analyses. Moreover, the inclusion of data quality metrics during the mapping process adds a layer of reliability and trustworthiness to the data. Information about outliers, missing data, and redundancy is crucial for data scientists and analysts, as it allows them to make informed decisions and apply appropriate processing techniques.

The first letter in SEDIMARK stands for Secure. How does Security is involved into SEDIMARK? In this blog post we will present an overview of the Security and Trust Domain within SEDIMARK!

Nowadays, the proliferation of large amount of data requires to ensure the security and integrity of the information exchanged. In the traditional way, a centralized data marketplaces face security challenges such as data manipulation, unauthorized access, and lack of transparency. 

In response to these challenges, Distributed Ledger Technology (DLT) has emerged as an alternative solution, offering decentralized (see "The letter D in SEDIMARK"), immutable, and transparent data exchange mechanisms.

Enhancing Security in Data Exchange

Centralized data marketplaces are susceptible to various security vulnerabilities, including single points of failure and data breaches.

Using DLT mitigates these risks: the control is decentralized and the cryptographic mechanisms ensure the security.

In SEDIMARK Marketplace the participants can securely exchange data without relying on third-parties (intermediaries), reducing the risks for unwanted data manipulation or unauthorized access to their data (or, more in general, their assets).

Security Features

SEDIMARK will employ (... or is it already?!) key features enabled by DLT, such as smart contracts, Self-Sovereign Identity (SSI), and cryptographic primitives to enhance security and transparency of the Marketplace.

The Smart Contracts automate the execution of agreements between parties, ensuring trustless and tamper-proof transactions.

SSI allows users of the Marketplace to retain full control on their own identity, without relying on centralized authorities (see A Matter of Identities).

Finally, the cryptographic primitives are the underlying functions to ensure data security and integrity.

Ensuring Data Origin

Using cryptographic functions, such as digest, ensures the creation of a mathematically unique fingerprint for a certain asset.

Recording (or "anchoring") such value onto the DLT allows to achieve an immutable data trail.

So, every user can be certain of the origin of the asset that is purchasing.

This also leads to additional transparency enhancing the Trust in this distributed marketplace.

SEDIMARK exploits traditional cryptographic mechanisms as well as DLT to freshen up data (asset!) exchange mechanism and to secure the Marketplace.

Do you want to know more? Stay tuned for next blog posts by signing up to our newsletter below.

Follow us on @Twitter / X and LinkedIn.

* Source image: shutterstock

The current document is the second deliverable of WP5 and reports the results of Task 5.2 activities regarding continuous platform integration. The actions adhere to the integration activities of the modules and components developed in other work packages (especially WP3 “Distributed data quality management and interoperability” and WP4 “Secure data sharing in a decentralized Marketplace”) based on the previously defined WP2 architecture. The target audience of the document is mainly the technical partners of SEDIMARK but it offers valuable insights to other stakeholders.

Before delving into the core of the deliverable, the document briefly analyses the implementation of the first integrated release of the SEDIMARK platform in terms of status, which includes deviations and challenges. Seven independent PoCs scenarios will ensure the proper functionalities of the platform. All technology providers are accountable for the various modules to which they are assigned based on a top-down integration plan outlined in the document SEDIMARK_D5.1 [1] too. Some architecture components are not included in the platform's first version because they are part of the platform's second and final releases.

All scenarios will be thoroughly analysed by their leaders on the following topics:

  • High-level description: Scope of the scenario, relation to other modules or scenarios
  • Step-by-step definition and data flows: Same approach as described in SEDIMARK_D5.1, including updates on the steps of each scenario, accompanied with possible figures, related datasets and dataflows applied.
  • Specifications of involved components: Brief description of the components used (references or links to the respective deliverable could be used).
  • Integration specification: Integration steps (inter-component communication, setup monitoring and logging, implement CI/CD).
  • Results: Output of each scenario, available figures and screenshots, results analysis.
  • Guidelines for deployment and execution: Guidelines, tips, or tricky points during the execution of the scenario.
  • Software licences: Identify licences for each component per scenario.

D5.2 deliverable can be downloaded from here.

The “D6.3 Dissemination and Impact creation activities. First version” deliverable, presents the ongoing and carried out activities, communications and dissemination material, along with the current status of the Key Performance Indicators (KPIs) for such activities. Besides, it also includes the current efforts for the cooperation with other projects and associations.
During the first half of the project lifetime, a number of dissemination and communication activities have been carried out, reaching a large audience of variable types, including users, citizens, other research projects and the scientific community.
The content from the current document will be continued in the deliverable SEDIMARK_D6.4 Dissemination and Impact creation activities which is due in M36 (September 2025).

This document, along with all the public deliverables and documents produced by SEDIMARK, can be found in the Publications & Resources section.

Data has become a growing business of the utmost importance in the recent years of IoT technological expansion, driving crucial decision-making systems at the EU and global level, impacting all domains: industry, politics and economy, society and individuals, as well as the environment.

As the volume of instreaming data being collected, stored and processed is constantly expanding, most systems and techniques to absorb efficiently, appropriately and in a scalable manner such data, are lacking or rapidly overwhelmed by the technological revolutions. Furthermore, of great concern, the quantity of circulating private and sensitive information linked to individuals and organizations. Consequently, the data is insufficiently managed and maintained, too often misunderstood due to its complexity, lacking in high quality standards, ill-adapted to large-scale AI analytics, which in turn leads to inappropriate handling, sharing and misuse of data across borders and domains, even though they conform to European RGPD and FAIR* principles!  

For this reason, SEDIMARK uses a data orchestrator called “Mage.ai” to : (i) better organize integration of multiple data sources, applications, toolboxes, services and systems, (ii) render scalable the data workflows to improve performance and reduce bottlenecks, (iii) ensure data consistency, harmony and highest quality, (iv) guarantee data privacy and security compliant with EU regulations by anonymization and decentralized systems, and finally to (v) minimize and mitigate potential risks by automating schedules for data and system maintenance, monitoring and alerting procedures. On top of all, the orchestrator enables all actors of the data to easily manage, adapt and visualize the data situation. 

(*) Findable easily, Accessible, Interoperable, Reusable

The synergy between Geosciences and Machine Learning in today's world are at the forefront of global concerns. For example, the management of water resources has become a critical issue. The integration of geosciences and machine learning is emerging as a new and innovative solution to these problems.

Geosciences provide a fundamental understanding of water systems. By analyzing geological data, scientists can understand the impact of environmental factors on water systems and assess risks such as human settlement risks, environmental risks, or scarcity.

Machine learning brings predictive analytics into this matter, offering the ability to forecast future trends based on historical data. In water management, ML algorithms can predict usage patterns, potential pollution incidents, and the impact of climate change on water resources. This predictive capability is invaluable in planning and implementing strategies for sustainable water usage and conservation.

Case Studies and Applications

Using geological data and historical consumption patterns, machine learning models can predict areas at risk of water scarcity or flood, allowing for early intervention.

Machine learning algorithms can analyze data from various sources to detect and predict pollution levels in water bodies, enabling timely measures to protect water quality.

By combining geological data with climate models, machine learning can forecast the long-term impacts of climate change on water resources, guiding various adaptation strategies.

This interdisciplinary approach not only enhances our understanding of water systems but also equips us with the tools to make informed and sustainable decisions.

Geosciences provide the foundational 'what' and 'why', while machine learning offers the 'when' and 'how'. This combination can provide the strategy of creating efficient, intelligent, and sustainable solutions for urban environments and industrial applications, which can be of interest for large companies like Siemens.

crossmenu
SEDIMARK
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.