Dataset Recommender: The Power of High-Quality Metadata

SEDIMARK · September 25, 2024

In today’s data-driven world, finding relevant datasets is crucial for researchers, data scientists and businesses. This has led to the development of dataset recommendation systems. Similarly, as the movie recommendation system used by Netflix guiding users to discover the most relevant movies, the dataset recommender system aims to guide users to navigate this complex landscape of dataset discovery efficiently.

Recommender systems learn to analyse user behaviour to make intelligent suggestions. This can be achieved with a variety of techniques, ranging from content-based filtering approaches that focus on the item descriptions and recommend similar items to those that the user has previously interacted with; through collaborative filtering approaches that recommend items based on interactions of similar users; to hybrid approaches combining the content and collaborative filtering.

High-quality recommender systems provide many benefits to users such as efficiency by automating the search for relevant items, personalisation improving user satisfaction and enhanced discovery exposing users to items they may not be aware of.

It is clear that an efficient recommender system has many advantages in various domains and dataset recommendation is no different. However, with the exponential growth of data, often residing at various locations, finding the right dataset for a specific task or project has become increasingly challenging [1]. Moreover, numerous datasets lack high-quality descriptions making the discovery even harder [2]. This is particularly important for content-based recommender systems as they rely on high-quality metadata. Therefore insufficient dataset metadata information brings challenges associated with effective dataset recommendations, as high-quality recommendations rely on high-quality metadata information.

In SEDIMARK, we aim to address the challenge of poor quality metadata in dataset recommendation with the development of novel techniques for dataset metadata enrichment. With automatic and efficient metadata enrichment, SEDIMARK can improve the overall user experience and dataset discoverability and drive better decision-making for the future.

[1] Chapman, Adriane, et al. "Dataset search: a survey." The VLDB Journal 29.1 (2020): 251-272.

[2] Reis, Juan Ribeiro, Flavia Bernadini, and Jose Viterbo. "A new approach for assessing metadata completeness in open data portals." International Journal of Electronic Government Research (IJEGR) 18.1 (2022): 1-20.

Subscribe to SEDIMARK!

SEDIMARK Follow

SEcure Decentralised Intelligent Data MARKetplace. A #horizoneurope project funded by the European Union.

Retweet on Twitter SEDIMARK Retweeted

Avatar European Commission @eu_commission ·

25 Dec

Wishing you a very Merry Christmas, wherever you are ✨🎄

Reply on Twitter 2004092247311302865 Retweet on Twitter 2004092247311302865 269 Like on Twitter 2004092247311302865 1843 Twitter 2004092247311302865

Avatar SEDIMARK @sedimark ·

24 Nov

Want to know more about the technology we developed during the last three years? Do not miss our paper "A decentralised architecture for secure exchange of assets in data spaces: The case of SEDIMARK" where we explain in detail our architecture for a decentralised marketplace!

SEDIMARK @sedimark

“A Decentralised Architecture for Secure Exchange of Assets in Data Spaces: The Case of SEDIMARK”
The SEDIMARK decentralized architecture for secure asset exchange within data spaces:
https://doi.org/10.1016/j.dib.2025.111757
#DataSpaces

Reply on Twitter 1992920873800806518 Retweet on Twitter 1992920873800806518 1 Like on Twitter 1992920873800806518 3 Twitter 1992920873800806518

Avatar SEDIMARK @sedimark ·

24 Nov

Smaller, Faster, Smarter AI 🧠
In the world of AI, bigger isn't always better, especially for edge devices. 📉
Introducing Model Pruning from the SEDIMARK Toolbox: a technique to slash model size and computational costs without losing predictive power. 🧵

Reply on Twitter 1992920343917596925 Retweet on Twitter 1992920343917596925 1 Like on Twitter 1992920343917596925 3 Twitter 1992920343917596925