Open space for Machine Learning is an open source platform for automated machine learning. The goal is to enable non-experts to solve every day problems with AI. It automates all steps on the way to the finished model with the help of an intuitive UI/UX.The alpha release supports the following features:
- Create a databag using Excel sheets, csv files and zip files with images
- Create a solution using Ludwig for categorization and regression problems
- Integration of external storage, e.g. Shepard using python scripts
- Multi-User Isolation
The current version is an alpha release, i.e. everything is work in progress and experimental. For more details please read the docs.
- ๐ Directory overview
- ๐ Usage
- ๐งช Testing
- ๐ Deployment
- ๐ Releasing
- ๐จ Monitoring
- ๐ Logging
- โน๏ธ About the project
We have several projects inside the repository. You can know more about each one of the projects taking a look to the repo folder structure:
โโโ [+] gitlab/ # CI/CD config
โโโ [+] manifests/ # K8s manifests for os4ml services
โโโ [+] services/ # Projects of the repository
โ โโโ frontend/ # OS4ML App, build with Angular
โ โโโ job-manager/ # FastAPI service for managing the execution of the ML pipelines
โ โโโ oas/ # Collection of all openapi specs of the services and templates for the code generation
โ โโโ keycloak/ # Templates and themes for [keycloak](https://www.keycloak.org/)
โ โโโ model-manager/ # FastAPI service to manage the main models
โ โโโ oas/ # Open api specs for the services and templates to generate clients
โ โโโ objectstore-manager/ # FastAPI service for user-isolated file management
โ โโโ workflow-translator/ # FastAPI service that manages the ML pipelines
โโโ [+] templates/ # code for the ML pipelines
โโโ ... # Other files
- Clone this repository locally
git clone ssh://git@gitlab.wogra.com:8022/developer/wogra/os4ml.git
- Navigate to the desired project (
services/<<SERVICE>>
) and follow the specific project documentation (README
file)
This is the list of available repos:
This is a quick overview about kind of tests implemented on each one of the projects (you'll find more detail in the project specific documentation):
Project | Unit | Integration | E2E | Screenshot |
---|---|---|---|---|
/frontend |
โ | โ | โ | โ |
/job-manager |
โ | โ | โ | โ |
/model-manager |
โ | โ | โ | โ |
/objectstore-manager |
โ | โ | โ | โ |
/workflow-translator |
โ | โ | โ | โ |
The deployment process is being managed by Gitlab. Just execute the corresponding pipeline steps. They will build docker images with and tag them. The argocd image update will notice the new image and notify argocd to deploy the new version.
The pipeline is formed by different stages, executed in this order:
- Prepare
- Test
- Build
- Deploy
- Reset
- E2E
Automatic Validates, in each one of the API projects, that the OAS file is well written
Automatic Lints the project files and runs the unit and integration tests.
Automatic Generates the application images (with Docker).
Manual Loads the built images and tags them to deploy them on the differen environments:
- feature
- dev
- testing
- release (only available on the
rc
branch) - staging (only available on the
main
branch) - prod (only available on the
main
branch and when a new tag is created)
Automatic (only when deploy to testing
environment is run)
Resets the testing stage so each run starts from scratch.
Automatic (only when deploy to testing
environment is run)
Runs the e2e and frontend integration tests.
We are sticking to the Gitflow workflow. However since this project is in its early stages, we are treating fixes like new features. Follow the following steps to create and deploy a new release:
- At some point merge the
dev
branch intorc
. - Deploy to the release stage and test the functionality. If some issues arise, fix them and merge them also in the
rc
branch. - If all is working, merge the
rc
branch intomain
. - Deploy to staging and test again (the staging envoronment uses the same infrastructure as the production envoronment).
- Create a new tag for the new version of the
main
branch and make sure the CHANGELOG is updated.
k9s is a great tool to manage and monitor your kubernetes cluster if running locally. Otherwise use the monitoring capabilities of your cloud provider.
You can check the logs of each pod directly by using kubectl
or k9s. However, you can automatically collect the logs of the pod by deploying fluentbit to the cluster.
Furthermore, you can access the logs of the services through the argocd UI and the logs of the ML pipelines through the kubeflow UI.
The project focuses on easy installation, intuitive UI/UX and comfortable machine learning. So we do not reinvent the wheel. Whenever possible, we use third-party open source software.
There is a lot of work to do. In the near future the following will happen:
- A Terraform module to install Os4ML on a k3d cluster using ArgoCD
- Solving regression problems (Winter 2022)
- Solving multi output problems (Spring 2022)
- Adding Transfer Learning Support (Summer 2023)
- Support for Model Sharing (Fall 2023)
- Suggestions for Transfer learning (Winter 2023)
- Data visualizations (Spring 2024)
- Intelligent Data Labeling (Summer 2024)
If you are interested in contributing, have questions, comments, or thoughts to share, or if you just want to be in the know, please consider joining the Os4ML Slack
If you are using Os4ML for a scientific project, please cite the following paper:
Rall, D., Bauer, B., & Fraunholz, T. (2023). Towards Democratizing AI: A Comparative Analysis of AI as a Service Platforms and the Open Space for Machine Learning Approach. Proceedings of the 2023 7th International Conference on Cloud and Big Data Computing, 34โ39. https://doi.org/10.1145/3616131.3616136
@inproceedings{rall2023towards,
author={Rall, Dennis and Bauer, Bernhard and Fraunholz, Thomas},
title={Towards Democratizing AI: A Comparative Analysis of AI as a Service Platforms and the Open Space for Machine Learning Approach},
year={2023},
isbn={9798400707339},
publisher={Association for Computing Machinery},
address={New York, NY, USA},
url={https://doi.org/10.1145/3616131.3616136},
doi={10.1145/3616131.3616136},
booktitle={Proceedings of the 2023 7th International Conference on Cloud and Big Data Computing},
pages={34โ39},
numpages={6},
keywords={AI-as-a-Service, Cloud Computing, Platform, Artificial Intelligence},
location={Manchester, United Kingdom},
series={ICCBDC '23}
}
Rall, D., Fraunholz, T., & Bauer, B. (2023). AI-Democratization: From Data-first to Human-first AI. Central European Conference on Information and Intelligent Systems, 261โ267.
@inproceedings{rall2023ai,
title={AI-Democratization: From Data-first to Human-first AI},
author={Rall, Dennis and Fraunholz, Thomas and Bauer, Bernhard},
booktitle={Central European Conference on Information and Intelligent Systems},
pages={261-67},
year={2023},
organization={Faculty of Organization and Informatics Varazdin}
}
Os4ML is a project of the WOGRA AG research group in cooperation with the German Aerospace Center and is funded by the Ministry of Economic Affairs, Regional Development and Energy as part of the High Tech Agenda of the Free State of Bavaria.
Os4ML is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).
See LICENSE-APACHE, LICENSE-MIT, and COPYRIGHT for details.