Flyte is a container-native, type-safe workflow and pipelines platform optimized for large scale processing and machine learning written in Golang. Workflows can be written in any language, with out of the box support for Python.

Homepage

Introduction

Flyte is a fabric that connects disparate computation backends using a type safe data dependency graph. It records all changes to a pipeline, making it possible to rewind time. It also stores a history of all executions and provides an intuitive UI, CLI and REST/gRPC API to interact with the computation.

Flyte is more than a workflow engine, it provides workflows as a core concepts, but it also provides a single unit of execution - tasks, as a top level concept. Multiple tasks arranged in a data producer-consumer order creates a workflow. Flyte workflows are pure specification and can be created using any language. Every task can also by any language. We do provide first class support for python, making it perfect for modern Machine Learning and Data processing pipelines.

Resources

Resources that would help you get a better understanding of Flyte.

Communication channels

Biweekly Community Sync

Starting April 21 2020, the Flyte community meets every other Tuesday at 9:00 AM PST (US West coast time).
You can join the zoom link.
Meeting notes are captured in Doc
Demo Signup Sheet

Conference Talks

Kubecon 2019 - Flyte: Cloud Native Machine Learning and Data Processing Platform video | deck
Kubecon 2019 - Running LargeScale Stateful workloads on Kubernetes at Lyft video
re:invent 2019 - Implementing ML workflows with Kubernetes and Amazon Sagemaker video
Cloud-native machine learning at Lyft with AWS Batch and Amazon EKS video
OSS + ELC NA 2020 splash
Datacouncil splash

Blog Posts

Introducing Flyte: A Cloud Native Machine Learning and Data Processing Platform

Podcasts

TWIML&AI - Scalable and Maintainable ML Workflows at Lyft - Flyte
Software Engineering Daily - Flyte: Lyft Data Processing Platform

Features

Used at Scale in production by 500+ users at Lyft with more than 900k workflow executed a month and more than 30+ million container executions per month
Centralized Inventory of Tasks, Workflows and Executions
Single Task Execution support - Start executing a task and then convert it to a workflow
gRPC / REST interface to define and executes tasks and workflows
Type safe construction of pipelines, each task has an interface which is characterized by its input and outputs. Thus illegal construction of pipelines fails during declaration rather than at runtime
Types that help in creating machine learning and data processing pipelines like - Blobs (images, arbitrary files), Directories, Schema (columnar structured data), collections, maps etc
Memoization and Lineage tracking
Workflows features

Multiple Schedules for every workflow
Parallel step execution
Extensible Backend to add customized plugin experiences (with simplified User experiences)
Arbitrary container execution
Branching
Inline Subworkflows (a workflow can be embeded within one node of the top level workflow)
Distributed Remote Child workflows (a remote workflow can be triggered and statically verified at compile time)
Array Tasks (map some function over a large dataset, controlled execution of 1000's of containers)
Dynamic Workflow creation and execution - with runtime type safety
Container side plugins with first class support in python
PreAlpha: Arbitrary flytekit less containers supported (RawContainer)

Maintain an inventory of tasks and workflows
Record history of all executions and executions (as long as they follow convention) are completely repeatable
Multi Cloud support (AWS, GCP and others)
Extensible core
Modularized
Automated notifications to Slack, Email, Pagerduty
Deep observability
Multi K8s cluster support
Comes with many system supported out of the box on K8s like Spark etc.
Snappy Console
Python CLI
Written in Golang and optimized for performance of large running jobs

In Progress

Golang CLI - flytectl

Coming Soon

Reactive pipelines
Grafana templates (user/system observability)
More integrations

Available Plugins

Containers
K8s Pods
AWS Batch Arrays
K8s Pod arrays
K8s Spark (native pyspark and java/scala)
Qubole Hive
Presto Queries
Distributed Pytorch (K8s Native) - Pytorch Operator
Sagemaker (builtin algorithms & custom models)
Distributed Tensorflow (K8s Native) - TFOperator
Papermill Notebook execution (python and spark)

Coming soon

Flink-K8s

Current Usage

Changelogs

Component Repos

Repo	Language	Purpose	Status
flyte	Kustomize,RST	deployment, documentation, issues	Production-grade
flyteidl	Protobuf	interface definitions	Production-grade
flytepropeller	Go	execution engine	Production-grade
flyteadmin	Go	control plane	Production-grade
flytekit	Python	python SDK and tools	Production-grade
flyteconsole	Typescript	admin console	Production-grade
datacatalog	Go	manage input & output artifacts	Production-grade
flyteplugins	Go	flyte plugins	Production-grade
flytestdlib	Go	standard library	Production-grade
flytesnacks	Python	examples, tips, and tricks	Incubating
flytekit-java	Java/Scala	Java & scala SDK for authoring Flyte workflows	Incubating
flytectl	Java/Scala	Java & scala SDK for authoring Flyte workflows	Incomplete

Production K8s Operators

Repo	Language	Purpose
Spark	Go	Apache Spark batch
Flink	Go	Apache Flink streaming

Top Contributors

Thank you to the community for making Flyte possible.

seetharamireddy540/flyte