A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin
- ActionChain - A workflow system for simple linear success/failure workflows.
- Adage - Small package to describe workflows that are not completely known at definition time.
- Airflow - Python-based workflow system created by AirBnb.
- Anduril - Component-based workflow framework for scientific data analysis.
- Antha - High-level language for biology.
- Bds - Scripting language for data pipelines.
- BioMake - GNU-Make-like utility for managing builds and complex workflows.
- BioQueue - Explicit framework with web monitoring and resource estimation.
- Bioshake - Haskell DSL built on shake with strong typing and EDAM support
- Bistro - Library to build and execute typed scientific workflows.
- Bpipe - Tool for running and managing bioinformatics pipelines.
- Briefly - Python Meta-programming Library for Job Flow Control.
- Cluster Flow - Command-line tool which uses common cluster managers to run bioinformatics pipelines.
- Clusterjob - Automated reproducibility, and hassle-free submission of computational jobs to clusters.
- Compss - Programming model for distributed infrastructures.
- Conan2 - Light-weight workflow management application.
- Consecution - A Python pipeline abstraction inspired by Apache Storm topologies.
- Cosmos - Python library for massively parallel workflows.
- Cromwell - Workflow Management System geared towards scientific workflows from the Broad Institute.
- Cuneiform - Advanced functional workflow language and framework, implemented in Erlang.
- Dagobah - Simple DAG-based job scheduler in Python.
- Dagr - A scala based DSL and framework for writing and executing bioinformatics pipelines as Directed Acyclic GRaphs.
- Dask - Dask is a flexible parallel computing library for analytics.
- Dockerflow - Workflow runner that uses Dataflow to run a series of tasks in Docker.
- Doit - Task management & automation tool.
- Drake - Robust DSL akin to Make, implemented in Clojure.
- Drake R package - Reproducibility and high-performance computing with an easy R-focused interface. Unrelated to Factual's Drake.
- Dray - An engine for managing the execution of container-based workflows.
- Fission Workflows - A fast, lightweight workflow engine for serverless/FaaS functions.
- Flex - Language agnostic framework for building flexible data science pipelines (Python/Shell/Gnuplot).
- Flowr - Robust and efficient workflows using a simple language agnostic approach (R package).
- Gc3pie - Python libraries and tools for running applications on diverse Grids and clusters.
- Gwf - Make-like utility for submitting workflows via qsub.
- Hive - System for creating and running pipelines on a distributed compute resource.
- HyperLoom - Platform for defining and executing workflow pipelines in large-scale distributed environments.
- Joblib - Set of tools to provide lightweight pipelining in Python.
- Jug - A task Based parallelization framework for Python.
- Ketrew - Embedded DSL in the OCAML language alongside a client-server management application.
- Kronos - Workflow assembler for cancer genome analytics and informatics.
- Loom - Tool for running bioinformatics workflows locally or in the cloud.
- Longbow - Job proxying tool for biomolecular simulations.
- Luigi - Python module that helps you build complex pipelines of batch jobs.
- Makeflow - Workflow engine for executing large complex workflows on clusters.
- Mara - A lightweight, opinionated ETL framework, halfway between plain scripts and Apache Airflow
- Mario - Scala library for defining data pipelines.
- Martian - A language and framework for developing and executing complex computational pipelines.
- MD Studio - Microservice based workflow engine.
- Mistral - Python based workflow engine by the Open Stack project.
- Moa - Lightweight workflows in bioinformatics.
- Nextflow - Flow-based computational toolkit for reproducible and scalable bioinformatics pipelines.
- NiPype - Workflows and interfaces for neuroimaging packages.
- OpenGE - Accelerated framework for manipulating and interpreting high-throughput sequencing data.
- Pachyderm - Distributed and reproducible data pipelining and data management, built on the container ecosystem.
- Parsl - Parallel Scripting Library.
- PipEngine Ruby based launcher for complex biological pipelines.
- Pinball - Python based workflow engine by Pinterest.
- PyFlow - Lightweight parallel task engine.
- PypeFlow - Lightweight workflow engine for data analysis scripting.
- pyperator - Simple push-based python workflow framework using asyncio, supporting recursive networks.
- pyppl - A python lightweight pipeline framework.
- pypyr - Simple task runner for sequential steps defined in a pipeline yaml, with AWS and Slack plug-ins.
- Pwrake - Parallel workflow extension for Rake.
- Qdo - Lightweight high-throughput queuing system for workflows with many small tasks to perform.
- Qsubsec - Simple tokenised template system for SGE.
- Rabix - Python-based workflow toolkit based on the Common Workflow Language and Docker.
- Rain - Framework for large distributed task-based pipelines, written in Rust with Python API.
- Ray - Flexible, high-performance distributed Python execution framework.
- Reflow - Language and runtime for distributed, incremental data processing in the cloud.
- Remake - Make-like declarative workflows in R.
- Rmake - Wrapper for the creation of Makefiles, enabling massive parallelization.
- Rubra - Pipeline system for bioinformatics workflows.
- Ruffus - Computation Pipeline library for Python.
- Ruigi - Pipeline tool for R, inspired by Luigi.
- Sake - Self-documenting build automation tool.
- SciLuigi - Helper library for writing flexible scientific workflows in Luigi.
- SciPipe - Library for writing Scientific Workflows in Go.
- Scoop - Scalable Concurrent Operations in Python.
- Seqtools - Python library for lazy evaluation of pipelined transformations on indexable containers.
- Snakemake - Tool for running and managing bioinformatics pipelines.
- Spiff - Based on the Workflow Patterns initiative and implemented in Python.
- Stolos - Directed Acyclic Graph task dependency scheduler that simplify distributed pipelines.
- Stpipe - File processing pipelines as a Python library.
- Sundial - Jobsystem on AWS ECS or AWS Batch managing dependencies and scheduling.
- Suro - Java-based distributed pipeline from Netflix.
- Swift - Fast easy parallel scripting - on multicores, clusters, clouds and supercomputers.
- Tibanna Tool that helps you run genomic pipelines on Amazon cloud.
- Toil - Distributed pipeline workflow manager (mostly for genomics).
- Yap - Extensible parallel framework, written in Python using OpenMPI libraries.
- Wallaroo - Framework for streaming data applications and algorithms that react to real-time events.
- WorldMake - Easy Collaborative Reproducible Computing.
- ActivePapers - Computational science made reproducible and publishable.
- Apache Iravata - Framework for executing and managing computational workflows on distributed computing resources.
- Arteria - Event-driven automation for sequencing centers. Initiates workflows based on events.
- Arvados - A container based workflow platform.
- Biokepler - Bioinformatics Scientific Workflow for Distributed Analysis of Large-Scale Biological Data.
- Butler - Framework for running scientific workflows on public and academic clouds.
- Chipster - Open source platform for data analysis.
- Clubber - Cluster Load Balancer for Bioinformatics e-Resources.
- Digdag - Workflow manager designed for simplicity, extensibility and collaboration.
- Fireworks - Centralized workflow server for dynamic workflows of high-throughput computations.
- Galaxy - Web-based platform for biomedical research.
- Kepler - Kepler scientific workflow application from University of California.
- KNIME Analytics Platform - General-purpose platform with many specialized domain extensions.
- NextflowWorkbench - Integrated development environment for Nextflow, Docker and Reusable Workflows.
- OpenMOLE - Workflow Management System for exploration of models and parameter optimization.
- Ophidia - Data-analytics platform with declarative workflows of distributed operations.
- Pegasus - Workflow Management System.
- Pentaho Kettle - Workflow platform with a graphical design environment.
- Piper - Distributed workflow engine designed to be dead simple.
- Polyaxon - A platform for machine learning experimentation workflow.
- Reana - Platform for reusable research data analyses developed by CERN.
- Sushi - Supporting User for SHell script Integration.
- Yabi - Online research environment for grid, HPC and cloud computing.
- Taverna - Domain independent workflow system.
- VisTrails - Scientific workflow and provenance management system.
- Wings - Semantic workflow system utilizing Pegasus as execution system.
- Watchdog - Workflow management system for the automated and distributed analysis of large-scale experimental data.
- Common Workflow Language
- Cloudgene Workflow Language
- OpenMOLE DSL
- Workflow Description Language
- Yet Another Workflow Language
- Pipelines
- Workflow 4 Ever Initiative
- Workflow 4 Ever workflow research object model
- Workflow Patterns Initiative
- Workflow Patterns Library
- ResearchObject.org
- Beaker Notebook-style development environment.
- Binder - Turn a GitHub repo into a collection of interactive notebooks powered by Jupyter and Kubernetes
- IPython A rich architecture for interactive computing.
- Jupyter Language-agnostic notebook literate programming environment.
- Pathomx - Interactive data workflows built on Python.
- R Notebooks - R Markdown notebook literate programming environment.
- SoS - Readable, interactive, cross-platform and cross-language data science workflow system.
- Zeppelin - Web-based notebook that enables interactive data analytics.
- Cadence Distributed, scalable, durable, and highly available orchestration engine developed by Uber.
- LinkedPipes ETL - Linked Data publishing and consumption ETL tool.
- Kiba ETL - A data processing & ETL framework for Ruby.
- Argo - Get stuff done with container-native workflows for Kubernetes.
- Deis - Workflow system to create and manage applications on Kubernetes.
- Bazel - Build software just as engineers do at Google.
- DoIt - Highly generalized task-management and automation in Python.
- Gradle - Unified cross platforms builds.
- Scons - Python library focused on C/C++ builds.
- Shake - Define robust build systems akin to GNU Make using Haskell.
- Make - The GNU Make build system.
- HPC Grid Runner
- noWorkflow - Supporting infrastructure to run scientific experiments without a scientific workflow management system, and still get things like provenance.
- Reprozip - Simplifies the process of creating reproducible experiments from command-line executions.
- Awesome streaming - Curated list of awesome streaming frameworks, applications.
- Awesome ETL - Curated list of notable ETL (extract, transform, load) frameworks, libraries and software.
- Computational Data Analysis Workflow Systems