In the fast-evolving world of data science and AI, ensuring that workflows are reproducible, portable, and scalable is essential for success. However, many modern tools prioritize ease of use over standardization, making it difficult to share and execute workflows across different environments.
The SEDIMARK team tackled this challenge by creating a transformation methodology to convert Mage.ai workflows into Common Workflow Language (CWL) and Python-based workflows. This work aims to ensure compatibility with industry standards, improving the portability, reproducibility, and scalability of data pipelines. By integrating standardized workflows, SEDIMARK enables organizations to confidently manage their AI pipelines in secure, decentralized environments.
In the fast-evolving world of data science and AI, ensuring that workflows are reproducible, portable, and scalable is essential for success. However, many modern tools prioritize ease of use over standardization, making it difficult to share and execute workflows across different environments.
Why transform Mage.ai Workflows to CWL?
Mage.ai is a powerful tool for building data pipelines quickly and easily. However, its native workflows are not compatible with industry-standard formats like CWL, which are essential for reproducibility and portability.
CWL (Common Workflow Language) is an open standard that ensures workflows can be shared, reproduced, and executed across different platforms. It’s widely used in fields like bioinformatics and data science to standardize workflows for deployment in cloud environments, HPC clusters, and edge computing platforms.
By converting Mage.ai pipelines to CWL, organizations participating in SEDIMARK can achieve:
- Reproducibility by ensuring that workflows produce consistent results across different systems.
- Portability by allowing workflows to be moved between local machines, cloud platforms, and decentralized environments.
- Interoperability through seamlessly integration with other tools and platforms, such as Apache Airflow or Nextflow.
How SEDIMARK achieved the transformation
The SEDIMARK team developed a two-step methodology to transform Mage.ai pipelines into standardized workflows:
- Mage to Python Transformation:
The first step is converting Mage blocks into standalone Python scripts, ensuring compatibility with CWL and other execution environments. This involves:
- Removing Mage-specific dependencies.
- Formatting the code for Python standards.
- Ensuring the Python scripts can be executed independently.
- Python to CWL Transformation:
Once the Mage blocks are converted to Python, they are then wrapped in CWL workflows. This process involves:
- Creating CWL tool definitions for each Python script.
- Assembling a complete CWL workflow that links the individual steps.
- Generating a ZIP package containing all the necessary files for workflow execution.
This transformation enables organizations to leverage SEDIMARK’s decentralized marketplace while ensuring that their data pipelines remain compatible with industry standards.
The SEDIMARK team developed a two-step methodology to transform Mage.ai pipelines into standardized workflows:
- Mage to Python Transformation:
The first step is converting Mage blocks into standalone Python scripts, ensuring compatibility with CWL and other execution environments. This involves:
- Removing Mage-specific dependencies.
- Formatting the code for Python standards.
- Ensuring the Python scripts can be executed independently.
- Python to CWL Transformation:
Once the Mage blocks are converted to Python, they are then wrapped in CWL workflows. This process involves:
- Creating CWL tool definitions for each Python script.
- Assembling a complete CWL workflow that links the individual steps.
- Generating a ZIP package containing all the necessary files for workflow execution.
This transformation enables organizations to leverage SEDIMARK’s decentralized marketplace while ensuring that their data pipelines remain compatible with industry standards.
Why this matters for SEDIMARK and beyond
The SEDIMARK project aims to establish a distributed data marketplace, where organizations can securely exchange data pipelines, AI models, and other digital assets. The transformation from Mage.ai to CWL ensures that these assets are portable, reproducible, and compatible with existing standards.
For data scientists, engineers, and organizations participating in SEDIMARK, this transformation bridges the gap between intuitive pipeline design and standardized execution, enabling scalable and secure data processing.