This document provides the initial detailed description of the project use cases (UCs) and the initial requirements. As far as it concerns the use cases, the goal is to define them with specific details about their implementation and how they will use the project tools, and especially aiming to combine data from different data sources and platforms to show the potential for secure combination and sharing of data across sites. With respect to the requirements, the aim is to gather requirements from various stakeholders, industrial applications, the UCs and the concept of EU Data Spaces, and analyse them, in order to extract functional and non-functional requirements for making the data marketplace decentralised, trustworthy, interoperable and open to new data (open data), with intelligent AI-based and energy efficient data management tools capable of providing high quality data and services to consumers.
All the public deliverables and documents produced by SEDIMARK can be found in the Publications & Resources section.
Automated Machine Learning (Auto-ML) is an emerging technology that automates the tasks involved in building, training, and deploying machine learning models [1]. With the increasing ubiquity of machine learning, there is an ever-growing demand for specialized data scientists and machine learning experts. However, not all organizations have the resources to hire these experts. Auto-ML software platforms address this issue by enabling organizations to utilize machine learning more easily, even without specialized experts.
Auto-ML platforms can be obtained from third-party vendors, accessed through open-source repositories like GitHub, or developed internally. These platforms automate many of the tedious and error-prone tasks involved in machine learning, freeing up data scientists' time to focus on more complex tasks. Auto-ML uses advanced algorithms and techniques to optimize the model and improve its accuracy, leading to better results.
One of the key benefits of Auto-ML is that it reduces the risk of human error. Since many of the tasks involved in machine learning are tedious and repetitive, there is a high chance of error when performed manually. Auto-ML automates these tasks, reducing the risk of human error and improving the overall accuracy of the model. In addition to reducing errors, Auto-ML also provides transparency by documenting the entire process. This makes it easier for researchers to understand how the model was developed and to replicate the process. Auto-ML can also be used by teams of data scientists, enabling collaboration and sharing of insights.
Furthermore, Auto-GPT is one of the popular tools for Auto-ML. It is a language model that uses deep learning to generate human-like text. Auto-GPT can be used for a range of natural language processing tasks, including text classification, sentiment analysis, and language translation. By automating the process of text generation, Auto-GPT enables researchers to focus on more complex tasks, such as data analysis and model deployment. This is just one example of how Auto-ML is revolutionizing the field of machine learning and making it more accessible to organizations of all sizes.
SEDIMARK aims to enhance data quality and reduce the reliance on domain experts on the data curation process. To accomplish this objective, the SEDIMARK team in the Insight Centre for Data Analytics of the University College Dublin (UCD) is actively exploring the utilization of Auto-ML techniques. By leveraging Auto-ML, SEDIMARK strives to optimize its data curation process and minimize the involvement of domain experts, leading to more efficient and accurate results.
[1] He, Xin, Kaiyong Zhao, and Xiaowen Chu. "AutoML: A survey of the state-of-the-art." Knowledge-Based Systems 212 (2021): 106622.
In the digital age, streaming data - information that is generated and processed in real-time - is abundant. Applying Artificial Intelligence (AI) to mine this data holds immense value. It enables real-time decision-making and provides immediate insights, which is particularly beneficial for industries like finance, healthcare, and transportation, where instant responses can make a significant difference.
However, mining streaming data with AI is not without challenges [1]. The sheer volume and speed of the data make it difficult for conventional data mining methods to keep up. It demands high-speed processing and robust algorithms to handle real-time analysis. Furthermore, maintaining data quality and integrity is paramount, but challenging in a real-time context. Ensuring privacy and security of the data while mining it also poses significant obstacles. And, given the 'black box' nature of many AI systems, transparency and understanding of the data mining process can also be a concern.
SEDIMARK, a secure decentralized and intelligent data and services marketplace, is making strides to address these issues. The Insight Centre for Data Analytics in University College Dublin contributes to the development of innovative AI technologies capable of efficiently handling and mining streaming data. By combining advanced distributed AI technologies with a strong commitment to ethical guidelines, SEDIMARK is paving the way for a future where AI-driven insights from streaming data can be harnessed effectively, reliably, and ethically. Our aim is to transform the challenges of real-time data processing into opportunities, enhancing decision-making capabilities and fostering a more data-driven world.
[1] Gomes, Heitor Murilo, et al. "Machine learning for streaming data: state of the art, challenges, and opportunities." ACM SIGKDD Explorations Newsletter 21.2 (2019): 6-22.
In today's world, Artificial Intelligence (AI) is widespread and used in many different areas, such as the tech industry, financial services, health care, retail and manufacturing to name just a few. The main drive behind the surge of AI applications is its ability to extract useful information from very large data.
Despite the incredible positives AI has brought in recent years, it has also sparked numerous doubts about its trustworthiness. Some of the issues flagged include the lack of understanding of the algorithms used, in many cases described as black boxes. Similarly, it is often unclear what sort of data is applied in the training process of the AI system. Since AI systems learn from the data it is provided, it is crucial that this data does not contain biased human decisions or reflect unbalanced social biases.
To address these and many more trust issues in the emerging AI systems, the European Commission appointed the High-Level Expert Group on IA, and in 2019 this group presented Ethics Guidelines for Trustworthy AI. The outcome of these guidelines is that trustworthy AI should be lawful, ethical and robust and this should be achieved by addressing the following 7 key requirements:
In SEDIMARK it is our goal to develop cutting-edge AI technology such as machine learning and deep learning to enhance the experience of its users. In our path to this discovery, we aim to follow Trustworthy AI guidelines throughout the lifecycle of our project and beyond so that the AI developed and used in this project can be fully trusted by its users.
The SEDIMARK team in the Insight Centre for Data Analytics of University College Dublin (UCD) aims to exploit Insight’s expertise to promote ethical AI research within SEDIMARK and help the rest of the partners towards ensuring that the AI modules developed within the project follow the Ethical AI requirements.
ARTEMIS is the product of WINGS that is oriented to the proactive management of water, energy, gas infrastructures.
Based on the WINGS approach, it combines advanced technologies (IoT, AI, advanced networks and visualizations) with domain knowledge, to address diverse use cases. Being a management system it delivers the following functionalities.
Commercial traction has been achieved, while further interest is stimulated in various areas and with various tentative partners.
In parallel WINGS strives to develop and integrate further advances. A wave of new projects related to ARTEMIS activities is being implemented. SEDIMARK aims to create a secure decentralised data marketplace based on distributed ledger technology and AI. Under this new approach,
Within SEDIMARK, WINGS contributes on the marketplace (leveraging its experience in other vertical sectors, like food security and safety) and with AI strategies.
SEDIMARK will empower European stakeholders to set the proper foundation for the energy market, expand their competences and compete and scale at a global level
This document is a deliverable of the SEDIMARK project, funded by the European Commission under its Horizon Europe Framework Programme. This document presents the “D6.2 Dissemination and exploitation plan” deliverable, including the expected impact of the ongoing and planned activities, target audience, milestones, and mechanisms to assess the dissemination and exploitation activities carried out throughout the project execution.
Dissemination activities are any action related with the public disclosure of the project results by any appropriate means, including scientific publications. On the other hand, Communication activities also include the promotion of the project itself to multiple audiences, including both the media and the public. Separating the concept and the goal of dissemination and communication plan is important as the communication plan is about the project and its results, whilst the dissemination one is only about the results.
Moreover, exploitation activities have a broader scope compared to communication and dissemination. They can include actions such as utilizing the project results in further research activities other than those covered by the concerned project, developing, creating and marketing a product or process, creating and providing a service, or even in standardisation activities.
Machine Learning (ML) is a modern and efficient branch of AI (Artificial Intelligence), specialised in pattern recognition within data streams. It can provide precise analysis based on statistics to detect insights from a large data set, using the same principle as human neural networks in our brain. Every system equipped with ML must learn and discover patterns from historical data and compare its predictions with real data, before providing reliable information. That is why AI systems are trained with as much data as possible.
ML algorithms are more efficient than traditional modelling methods and can surpass human intelligence through its powerful computational efficiency. For instance, image recognition and time series analysis are well-known and widespread domains of application of ML for real world cases, such as the EU-funded SEDIMARK project. SEDIMARK aims at building a secure, trusted and intelligent decentralised data and services marketplace over several years, using ML in order to automate data quality management. Over time, the project results will provide ever-increasing accuracy and precision with its growing data sources.
ML could be directly used on edge systems to ensure data quality. Some algorithms are specialised for this purpose, with low power consumption and modest memory size. For instance, EdgeML and TinyML are open-source libraries that provide this kind of outcome.
The IoT platform from EGM (i.e. the EdgeSpot) is compatible with both libraries and could manage and distribute FAIR data in an energy efficient way. The ONNX (Open Neural Network Exchange), an open format representing ML models, may be a solution to select the right combinations of tools. And finally, with the help of the use cases provided within SEDIMARK, the project might elaborate a concrete strategy to automate and manage data quality.
SEDIMARK plans to build a distributed registry of resources stored on edge systems, close to where data is generated. The purpose is to clean, label, validate and anonymise data.
A digital twin is at its highest level, an architectural construct that is enabled by a combination of technology streams such as IoT (Internet of Things), Cloud Computing, Edge Computing, Fog Computing, Artificial Intelligence, Robotics, Machine Learning and Big Data. Analytics.
The digital twin concept is based on the fact that every physical part has a virtual part that is conceptually, structurally and functionally the same as the physical part. The concept of Digital Twins dates to the 1970s, used by NASA in the Apollo 13 mission. Nowadays Digital Twins are used in various industries, being a key concept in realizing the communication mechanism between the physical and virtual worlds using data.
The primary use case for Digital Twin is asset performance, utilization, and optimization. Digital Twin enables monitoring, diagnostic and forecasting capabilities for a specific use case.
Examples of Digital Twin application scenarios are described in the following:
Digital twins must be considered in their relationship to data spaces. A larger overview is therefore required, including systemic oversight, and supporting infrastructure.
International Data Spaces (IDS) provides data space technologies and concepts for various application domains that enable standardized data exchange and integration in a trusted environment. The International Data Spaces Association (IDSA) is a non-profit organization that promotes IDS architecture as an international standard in a variety of fields, including healthcare, mobility, agriculture, and more.
It is expected that in the medium term, in strong relation to specific requirements, collaboration solutions with centralized data storage in one or more clouds and distributed data storage with efficient data processing will be realized by combining the Digital Twins with Dataspaces.
With help of the use-cases provided within SEDIMARK, the project could elaborate a concrete strategy in which this relationship between Digital Twins and Dataspaces can prove of real value.
Digital solutions are important to the energy industry because they can help the energy system become more flexible, reliable, and efficient while also making it possible to incorporate renewable energy sources.
Because they can make it possible to share and analyse large amounts of data from a variety of sources, such as data on energy production and consumption from smart meters, weather data, and information about the grid, Data Marketplaces may be especially useful for the energy industry. Energy storage, grid management, and the integration of renewable energy sources can all benefit from a better understanding and management of the energy system by energy companies and grid operators. New business models like peer-to-peer energy trading and the integration of electric vehicles into the energy system can also be made possible by Data Spaces.
By unlocking the value of data and ensuring that data can be shared and reused across various industries, the European Commission wants to create a society and economy driven by data, according to the European Strategy for Data. Additionally, the strategy emphasizes the significance of data in achieving the EU's energy and climate goals.
The Green Deal is a plan by the European Commission to make the EU's economy sustainable by turning climate and environmental problems into opportunities in all sectors and making the transition fair and inclusive for everyone. It includes smart grid and metering systems, digital platforms for sharing and analysing energy data, and integrating distributed energy resources like electric vehicles and small-scale renewable energy production.
However, the question of how to apply these ideas to the current energy landscape is still unanswered. For instance, how can we guarantee that frameworks for data governance and management are in place to facilitate data access while safeguarding privacy and security? In terms of the integration of renewable energy sources and the creation of novel business models, how can we encourage the development and implementation of digital technologies and data solutions in the energy sector? And how can we make sure that the benefits of digitalization are available to everyone, especially in terms of reducing energy poverty and ensuring access to energy? To fully realize the potential of digital technologies and data for the energy sector, these are some of the crucial questions that must be answered and SEDIMARK will try to find the way to contribute to solve at least some of them. Get aboard the SEDIMARK cruise and share with us the experience!
image: ipopba | Getty Images