Automated Machine Learning (Auto-ML) is an emerging technology that automates the tasks involved in building, training, and deploying machine learning models [1]. With the increasing ubiquity of machine learning, there is an ever-growing demand for specialized data scientists and machine learning experts. However, not all organizations have the resources to hire these experts. Auto-ML software platforms address this issue by enabling organizations to utilize machine learning more easily, even without specialized experts.
Auto-ML platforms can be obtained from third-party vendors, accessed through open-source repositories like GitHub, or developed internally. These platforms automate many of the tedious and error-prone tasks involved in machine learning, freeing up data scientists' time to focus on more complex tasks. Auto-ML uses advanced algorithms and techniques to optimize the model and improve its accuracy, leading to better results.
One of the key benefits of Auto-ML is that it reduces the risk of human error. Since many of the tasks involved in machine learning are tedious and repetitive, there is a high chance of error when performed manually. Auto-ML automates these tasks, reducing the risk of human error and improving the overall accuracy of the model. In addition to reducing errors, Auto-ML also provides transparency by documenting the entire process. This makes it easier for researchers to understand how the model was developed and to replicate the process. Auto-ML can also be used by teams of data scientists, enabling collaboration and sharing of insights.
Furthermore, Auto-GPT is one of the popular tools for Auto-ML. It is a language model that uses deep learning to generate human-like text. Auto-GPT can be used for a range of natural language processing tasks, including text classification, sentiment analysis, and language translation. By automating the process of text generation, Auto-GPT enables researchers to focus on more complex tasks, such as data analysis and model deployment. This is just one example of how Auto-ML is revolutionizing the field of machine learning and making it more accessible to organizations of all sizes.
SEDIMARK aims to enhance data quality and reduce the reliance on domain experts on the data curation process. To accomplish this objective, the SEDIMARK team in the Insight Centre for Data Analytics of the University College Dublin (UCD) is actively exploring the utilization of Auto-ML techniques. By leveraging Auto-ML, SEDIMARK strives to optimize its data curation process and minimize the involvement of domain experts, leading to more efficient and accurate results.
[1] He, Xin, Kaiyong Zhao, and Xiaowen Chu. "AutoML: A survey of the state-of-the-art." Knowledge-Based Systems 212 (2021): 106622.
In the digital age, streaming data - information that is generated and processed in real-time - is abundant. Applying Artificial Intelligence (AI) to mine this data holds immense value. It enables real-time decision-making and provides immediate insights, which is particularly beneficial for industries like finance, healthcare, and transportation, where instant responses can make a significant difference.
However, mining streaming data with AI is not without challenges [1]. The sheer volume and speed of the data make it difficult for conventional data mining methods to keep up. It demands high-speed processing and robust algorithms to handle real-time analysis. Furthermore, maintaining data quality and integrity is paramount, but challenging in a real-time context. Ensuring privacy and security of the data while mining it also poses significant obstacles. And, given the 'black box' nature of many AI systems, transparency and understanding of the data mining process can also be a concern.
SEDIMARK, a secure decentralized and intelligent data and services marketplace, is making strides to address these issues. The Insight Centre for Data Analytics in University College Dublin contributes to the development of innovative AI technologies capable of efficiently handling and mining streaming data. By combining advanced distributed AI technologies with a strong commitment to ethical guidelines, SEDIMARK is paving the way for a future where AI-driven insights from streaming data can be harnessed effectively, reliably, and ethically. Our aim is to transform the challenges of real-time data processing into opportunities, enhancing decision-making capabilities and fostering a more data-driven world.
[1] Gomes, Heitor Murilo, et al. "Machine learning for streaming data: state of the art, challenges, and opportunities." ACM SIGKDD Explorations Newsletter 21.2 (2019): 6-22.
In today's world, Artificial Intelligence (AI) is widespread and used in many different areas, such as the tech industry, financial services, health care, retail and manufacturing to name just a few. The main drive behind the surge of AI applications is its ability to extract useful information from very large data.
Despite the incredible positives AI has brought in recent years, it has also sparked numerous doubts about its trustworthiness. Some of the issues flagged include the lack of understanding of the algorithms used, in many cases described as black boxes. Similarly, it is often unclear what sort of data is applied in the training process of the AI system. Since AI systems learn from the data it is provided, it is crucial that this data does not contain biased human decisions or reflect unbalanced social biases.
To address these and many more trust issues in the emerging AI systems, the European Commission appointed the High-Level Expert Group on IA, and in 2019 this group presented Ethics Guidelines for Trustworthy AI. The outcome of these guidelines is that trustworthy AI should be lawful, ethical and robust and this should be achieved by addressing the following 7 key requirements:
- Human Agency Oversight - allowing humans to make informed decisions and foster their fundamental human rights, while also ensuring proper human oversight of the AI system.
- Technical Robustness and Safety - AI systems need to be safe, accurate, reliable and reproducible.
- Privacy and Data Governance - respecting user privacy alongside ensuring the quality and integrity of the data.
- Transparency - Ai transparency is achieved through the explainability of the AI systems and their decisions.
- Diversity, non-discrimination and fairness - The AI system must avoid unfair bias while being accessible to all.
- Societal and Environmental well-being - it must be ensured that the AI system is sustainable and environmentally friendly.
- Accountability - accountability and responsibility for AI systems as well as their outcomes must be ensured.
In SEDIMARK it is our goal to develop cutting-edge AI technology such as machine learning and deep learning to enhance the experience of its users. In our path to this discovery, we aim to follow Trustworthy AI guidelines throughout the lifecycle of our project and beyond so that the AI developed and used in this project can be fully trusted by its users.
The SEDIMARK team in the Insight Centre for Data Analytics of University College Dublin (UCD) aims to exploit Insight’s expertise to promote ethical AI research within SEDIMARK and help the rest of the partners towards ensuring that the AI modules developed within the project follow the Ethical AI requirements.
Data has become a new currency in the current data-driven economy. The EU has launched the EU data market monitoring tool, which continuously monitors the impact of the data economy in the member states. The tool has identified that in 2021 the EU there are more than 190.000 data supplier companies, and more than 560.000 data users. For the revenues of the data companies, the 2022 figures are at 84B Euros with forecasts towards 114B in 2025 and 137B in 2030.
Data quality is a persistent issue in the ongoing digital transformation of European businesses. Some studies estimate that the curation and cleaning of data can take up as much as 80% of data professionals time, which is time not spent on producing insights or actual products and services. The quality of data used within a business is of utmost importance, since according to reports, bad data or dirty data cost the US 3 trillion USD per year. For instance, many business datasets can contain a high number of anomalies, duplicates, errors and missing values which degrade the value of the data for making business decisions. Unhandled bias in the data, such as when a dataset fails to account properly for minority labels (i.e. for gender, age, origin, or any other label in the data), can result in machine learning models that are skewed towards unfair outcomes. Additionally, much of the existing data in business silos often lacks the proper documentation and annotation that would allow professionals to properly leverage it towards downstream decision making tasks.
One of the main pillars of the SEDIMARK project is to promote data quality for the data that will be shared on the marketplace. SEDIMARK will build a complete data curation and quality improvement pipeline that will be provided to the data providers so that they can assess the quality of their data and clean them in order to improve the quality. This data curation pipeline will require minimum human intervention from domain experts to provide optimal results, but will also be fully customisable for experts to unlock maximum performance. This will be achieved by exploring state of the art techniques in Auto-ML (automated machine learning) and Meta-ML, both of which can be applied to transform the data with minimal human supervision, by learning from previous tasks.
Additionally, the marketplace within SEDIMARK will prioritise and promote data providers who undertake the effort to curate their data before sharing them widely. SEDIMARK will implement a range of transparent data quality metrics that will show statistics about the data and these will be displayed side by side with the data offerings on the marketplace. This will help consumers to find high quality data so that they minimise the time they spend preprocessing the data for their services. Moreover, an efficient recommender system within SEDIMARK will also help consumers to easily find high quality, highly rated offerings within their domain.
In conclusion, SEDIMARK aims to provide tools for improving the data quality both for providers and consumers, boosting the data economy, while at the same time saving significant time from data scientists, allowing them to focus more their time on actually extracting value from the high quality data producing new products and services instead of spending the majority of their time cleaning the data themselves. The SEDIMARK team in the Insight Centre for Data Analytics of the University College Dublin (UCD) builds on Insight’s data expertise and is leading the activities to define efficient metrics for data quality and develop an automated and simplified data curation and quality improvement pipeline for data providers and users to check and improve their datasets.
Data interoperability refers to the functionality of information systems to exchange data and enable information sharing. More specifically, itis defined as the ability of systems and services that create, exchange, and consume data to have clear, shared expectations for the format, contents, context, and meaning of that data. Thus, it allows to access and process data from multiple sources in diverse formats without losing sense and then to integrate that data for mapping, visualization, and other forms of representation and analysis. Data interoperability enables people to find, explore, and understand the structure and content of heterogeneous data.
In this context, SEDIMARK aims to provide an enriched secure decentralized data and services marketplace where scattered data from various domains and geographical locations within the EU can be easily generated, cleaned, protected, discovered, and enriched with metadata, AI and analytics, and exploited for diverse business and research scenarios. SEDIMARK involves a combination of heterogeneous data, and achieving data interoperability will allow it to maximize the value of the data and overcome the significant challenges posed by distributed assets (heterogeneity, data formats, sources, etc). For this to happen, SEDIMARK will reuse the semantic models developed in previous and ongoing EU initiatives, such as Gaia-X, IDS and NGSI-LD, and propose extensions to them to create one generic semantic model able to annotate and enrich heterogeneous data from multiple domains semantically.
Besides data, interoperability between AI models that emerge from this data is of great interest. In the decentralized environment of SEDIMARK, decentralized training requires that users train their models locally and then exchange model weights for jointly learning a global model. Ensuring that all SEDIMARK users will use the exact same machine learning platform for training the model and the exact same machines is unrealistic. So, SEDIMARK models will be agnostic to underlying platforms and SEDIMARK will provide tools to convert models to various formats and support models to run on machines of various capabilities and on various platforms.
ARTEMIS is the product of WINGS that is oriented to the proactive management of water, energy, gas infrastructures.
Based on the WINGS approach, it combines advanced technologies (IoT, AI, advanced networks and visualizations) with domain knowledge, to address diverse use cases. Being a management system it delivers the following functionalities.
- Efficient metering: optimized information flow and cost with 24/7 capability, prediction of demand and of capabilities);
- Fault management: faulty meters, predictive maintenance, outage handling (energy), leakage or flood avoidance (water), outage handling.
- Performance optimizations: optimization of water quality, maximization of revenue water, optimization of the deployment of renewables and of storage components, optimization for residences / businesses factories.
- Configuration and security aspects.
Commercial traction has been achieved, while further interest is stimulated in various areas and with various tentative partners.
In parallel WINGS strives to develop and integrate further advances. A wave of new projects related to ARTEMIS activities is being implemented. SEDIMARK aims to create a secure decentralised data marketplace based on distributed ledger technology and AI. Under this new approach,
- Data will no longer be stored on the “core cloud” but also on “edge systems”, close to where they are generated, thus avoiding security concerns.
- According to diverse strategies, data will be “cleaned”, labelled and classified, in accordance with legal / ethical frameworks and FAIR (findable, accessible, interoperable and reusable) principles, for enabling easy linkage and efficient utilization.
- Diverse analysis mechanisms can be powered.
Within SEDIMARK, WINGS contributes on the marketplace (leveraging its experience in other vertical sectors, like food security and safety) and with AI strategies.
SEDIMARK will empower European stakeholders to set the proper foundation for the energy market, expand their competences and compete and scale at a global level
This document is a deliverable of the SEDIMARK project, funded by the European Commission under its Horizon Europe Framework Programme. This document presents the “D6.2 Dissemination and exploitation plan” deliverable, including the expected impact of the ongoing and planned activities, target audience, milestones, and mechanisms to assess the dissemination and exploitation activities carried out throughout the project execution.
Dissemination activities are any action related with the public disclosure of the project results by any appropriate means, including scientific publications. On the other hand, Communication activities also include the promotion of the project itself to multiple audiences, including both the media and the public. Separating the concept and the goal of dissemination and communication plan is important as the communication plan is about the project and its results, whilst the dissemination one is only about the results.
Moreover, exploitation activities have a broader scope compared to communication and dissemination. They can include actions such as utilizing the project results in further research activities other than those covered by the concerned project, developing, creating and marketing a product or process, creating and providing a service, or even in standardisation activities.
SEDIMARK knows the importance of regulating data management issues within a context such as the one posed by the project. A solution will be considered where consortium partners will deposit all underlying information on data-related business processes (data storage, data provisioning, processing etc.) of the SEDIMARK solution clearly and transparently.
The purpose of the Data Management Action Plan (DMAP) is to identify the main data management elements that apply to the SEDIMARK project and the consortium. This document is the first version of the DMAP and will be reviewed as soon as there is a clearer understanding of the types of data that will be collected.
Given the wide range of sources from which data will be collected or become available within the project, this document outlines that the consortium partners will consider embracing and applying the Guidelines on FAIR Data Management in Horizon 2020 and Horizon Europe (HE); “In general terms your data should be ‘FAIR’, that is Findable, Accessible, Interoperable and Re-usable”, as information about data to be collected becomes clearer”.
As the name suggest, SEDIMARK will be a Data and Service marketplace. But SEDIMARK focus is not only on data and services assets: Decentralisation also play a key role…
D as Decentralisation
The decentralisation allows to stay away from a single and central authority for control and decision-making, instead it enables the interactions directly among multiple independent parties.
There are several perks in a decentralised system:
- Reduced Weakness: relying too much on one entity can lead to systemic failures. Multiple entities shield from unfortunate events.
- Optimization of resources: in a decentralized system, the resources available can be spread among multiple entities to provide better services.
- Security and Trust: in a decentralized network, security and trust is a must pre-condition.
SEDIMARK Marketplace achieves the security and trust thanks to Distributed Ledger Technologies (DLT).
D as DLT
A DLT is a network composed of several nodes that independently replicate, share, and synchronize the same data spread across many different physical locations without a central administrator.
The most famous example is the Blockchain, today largely employed for financial transactions with bitcoin crypto-currency. However, the SEDIMARK decentralised architecture will be based on a different DLT, that is the IOTA Tangle designed and deployed by the IOTA Foundation. The IOTA Tangle is an open, feeless and highly scalable distributed ledger, designed to support both data and value transfer with a green fashion.
Do you want to know more? Stay tuned for next blog posts by signing up to our newsletter below.
Follow us on LinkedIn Twitter GitHub
* Source image: shutterstock