/awesome-mlops-1

:sunglasses: A curated list of awesome MLOps tools

Primary LanguagePython

Awesome MLOps Awesome

A curated list of awesome MLOps tools.

Inspired by awesome-python.


CI/CD for Machine Learning

Tools for performing CI/CD for Machine Learning.

  • CML - Open-source library for implementing CI/CD in machine learning projects.

Cron Job Monitoring

Tools for monitoring cron jobs (recurring jobs).

Data Exploration

Tools for performing data exploration.

  • Apache Zeppelin - Notebook that enables data-driven, interactive data analytics and collaborative documents.
  • Google Colab - Hosted Jupyter notebook service that requires no setup to use.
  • Jupyter Notebook - Web-based notebook environment for interactive computing.
  • JupyterLab - The next-generation user interface for Project Jupyter.
  • Jupytext - Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts.
  • Polynote - The polyglot notebook with first-class Scala support.

Data Management

Tools for performing data management.

  • Arrikto - Dead simple, ultra fast storage for the hybrid Kubernetes world.
  • DVC - Management and versioning of datasets and machine learning models.
  • Intake - A lightweight set of tools for loading and sharing data in data science projects.

Data Processing

Tools related to data processing and data pipelines.

  • Airflow - Platform to programmatically author, schedule, and monitor workflows.
  • Hadoop - Framework that allows for the distributed processing of large data sets across clusters.
  • Spark - Unified analytics engine for large-scale data processing.

Data Visualization

Tools for data visualization, reports and dashboards.

  • Data Studio - Reporting solution for power users who want to go beyond the data and dashboards of GA.
  • Metabase - The simplest, fastest way to get business intelligence and analytics to everyone.
  • Redash - Connect to any data source, easily visualize, dashboard and share your data.
  • Superset - Modern, enterprise-ready business intelligence web application.
  • Tableau - Powerful and fastest growing data visualization tool used in the business intelligence industry.

Feature Store

Feature store tools for data serving.

  • Butterfree - A tool for building feature stores. Transform your raw data into beautiful features.
  • Feast - End-to-end open source feature store for machine learning.

Hyperparameter Tuning

Tools and libraries to perform hyperparameter tuning.

  • Hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
  • Katib - Kubernetes-based system for hyperparameter tuning and neural architecture search.
  • Optuna - Open source hyperparameter optimization framework to automate hyperparameter search.
  • Tune - Python library for experiment execution and hyperparameter tuning at any scale.

Knowledge Sharing

Tools for sharing knowledge to the entire team/company.

  • Knowledge Repo - Knowledge sharing platform for data scientists and other technical professions.
  • Kyso - One place for data insights so your entire team can learn from your data.

Machine Learning Platform

Complete machine learning platform solutions.

  • Algorithmia - Securely govern your machine learning operations with a healthy ML lifecycle.
  • Allegro AI - Transform ML/DL research into products. Faster.
  • CNVRG - An end-to-end machine learning platform to build and deploy AI models at scale.
  • Cubonacci - Intuitive code-first MLOps platform that streamlines the end-to-end machine learning workflow.
  • DAGsHub - A platform built on open source tools for data, model and pipeline management.
  • Dataiku - Platform democratizing access to data and enabling enterprises to build their own path to AI.
  • DataRobot - AI platform that democratizes data science and automates the end-to-end ML at scale.
  • Domino - One place for your data science tools, apps, results, models, and knowledge.
  • Gradient - Multicloud CI/CD and MLOps platform for machine learning teams.
  • H2O - Open source leader in AI with a mission to democratize AI for everyone.
  • Hopsworks - Open-source platform for developing and operating machine learning models at scale.
  • Iguazio - Data science platform that automates MLOps with end-to-end machine learning pipelines.
  • Knime - Create and productionize data science using one easy and intuitive environment.
  • Kubeflow - Making deployments of ML workflows on Kubernetes simple, portable and scalable.
  • LynxKite - A complete graph data science platform for very large graphs and other datasets.
  • ML Workspace - All-in-one web-based IDE specialized for machine learning and data science.
  • Modzy - AI platform and marketplace offering scalable, secure, and ready-to-deploy AI models.
  • Pachyderm - Combines data lineage with end-to-end pipelines on Kubernetes, engineered for the enterprise.
  • Polyaxon - A platform for reproducible and scalable machine learning and deep learning on kubernetes.
  • Sagemaker - Fully managed service that provides the ability to build, train, and deploy ML models quickly.
  • Valohai - Takes you from POC to production while managing the whole model lifecycle.

Model Lifecycle

Tools for managing model lifecycle (tracking experiments, parameters and metrics).

  • Comet - Track your datasets, code changes, experimentation history, and models.
  • Mlflow - Open source platform for the machine learning lifecycle.
  • ModelDB - Open source ML model versioning, metadata, and experiment management.
  • Neptune AI - The most lightweight experiment management tool that fits any workflow.
  • Sacred - A tool to help you configure, organize, log and reproduce experiments.

Model Serving

Tools for serving models in production.

  • BentoML - Open-source platform for high-performance ML model serving.
  • Cortex - Machine learning model serving infrastructure.
  • GraphPipe - Machine learning model deployment made simple.
  • KFServing - Kubernetes custom resource definition for serving ML models on arbitrary frameworks.
  • PredictionIO - Event collection, deployment of algorithms, evaluation, querying predictive results via APIs.
  • Seldon - Take your ML projects from POC to production with maximum efficiency and minimal risk.
  • Streamlit - Lets you create apps for your ML projects with deceptively simple Python scripts.
  • TensorFlow Serving - Flexible, high-performance serving system for ML models, designed for production.
  • TorchServe - A flexible and easy to use tool for serving PyTorch models.

Optimization Tools

Optimization tools related to model scalability in production.

  • Dask - Provides advanced parallelism for analytics, enabling performance at scale for the tools you love.
  • Fiber - Python distributed computing library for modern computer clusters.
  • Horovod - Distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
  • Mahout - Distributed linear algebra framework and mathematically expressive Scala DSL.
  • MLlib - Apache Spark's scalable machine learning library.
  • Modin - Speed up your Pandas workflows by changing a single line of code.
  • Petastorm - Enables single machine or distributed training and evaluation of deep learning models.
  • Rapids - Gives the ability to execute end-to-end data science and analytics pipelines entirely on GPUs.
  • Ray - Fast and simple framework for building and running distributed applications.
  • Singa - Apache top level project, focusing on distributed training of DL and ML models.
  • Tpot - Automated ML tool that optimizes machine learning pipelines using genetic programming.

Simplification Tools

Tools related to machine learning simplification and standardization.

  • Hermione - Help Data Scientists on setting up more organized codes, in a quicker and simpler way.
  • Koalas - Pandas API on Apache Spark. Makes data scientists more productive when interacting with big data.
  • Ludwig - Allows users to train and test deep learning models without the need to write code.
  • PyCaret - Open source, low-code machine learning library in Python.
  • Turi Create - Simplifies the development of custom machine learning models.

Workflow Tools

Tools and frameworks to create workflows or pipelines in the machine learning context.

  • Argo - Open source container-native workflow engine for orchestrating parallel jobs on Kubernetes.
  • Flyte - Easy to create concurrent, scalable, and maintainable workflows for machine learning.
  • Kale - Aims at simplifying the Data Science experience of deploying Kubeflow Pipelines workflows.
  • Kedro - Library that implements software engineering best-practice for data and ML pipelines.
  • Metaflow - Human-friendly lib that helps scientists and engineers build and manage data science projects.
  • Prefect - A workflow management system, designed for modern infrastructure.

Resources

Where to discover new tools and discuss about existing ones.

Articles

Other Lists

Podcasts

Slack

Websites

Contributing

All contributions are welcome! Please take a look at the contribution guidelines first.