Flyte is a production-grade, container-native, type-safe workflow and pipelines platform optimized for large scale processing and machine learning written in Golang. Workflows can be written in any language, with out of the box support for Python, Java and Scala.
HomePage | Quickstart | Documentation | Features | Community & Resources | Changelogs | Components
Flyte is a fabric that connects disparate computation backends using a type safe data dependency graph. It records all changes to a pipeline, making it possible to rewind time. It also stores a history of all executions and provides an intuitive UI, CLI and REST/gRPC API to interact with the computation.
Flyte is more than a workflow engine, it provides workflows as a core concepts, but it also provides a single unit of execution - tasks, as a top level concept. Multiple tasks arranged in a data producer-consumer order creates a workflow. Flyte workflows are pure specification and can be created using any language. Every task can also by any language. We do provide first class support for python, making it perfect for modern Machine Learning and Data processing pipelines.
With docker installed, run this command:
docker run --rm --privileged -p 30081:30081 -p 30084:30084 ghcr.io/flyteorg/flyte-sandbox
This creates a local Flyte sandbox. Once the sandbox is ready, you should see the following message: Flyte is ready! Flyte UI is available at http://localhost:30081/console
. Go ahead and visit http://localhost:30081/console.
A quick visual tour of the console
Refer to Docs - Getting Started for complete end to end example.
Resources that would help you get a better understanding of Flyte.
-
📣 Flyte OSS Community Sync Every alternate Tuesday, 9am-10am PDT (Checkout the events calendar & subscribe
-
You can join the zoom link.
-
Meeting notes and backlog of topics are captured in Doc
- Kubecon 2019 - Flyte: Cloud Native Machine Learning and Data Processing Platform video | deck
- Kubecon 2019 - Running LargeScale Stateful workloads on Kubernetes at Lyft video
- re:invent 2019 - Implementing ML workflows with Kubernetes and Amazon Sagemaker video
- Cloud-native machine learning at Lyft with AWS Batch and Amazon EKS video
- OSS + ELC NA 2020 splash
- Datacouncil splash
- FB AI@Scale Making MLOps & DataOps a reality
- GAIC 2020
- Introducing Flyte: A Cloud Native Machine Learning and Data Processing Platform
- Building a Gateway to Flyte
- TWIML&AI - Scalable and Maintainable ML Workflows at Lyft - Flyte
- Software Engineering Daily - Flyte: Lyft Data Processing Platform
- MLOps Coffee session - Flyte: an open-source tool for scalable, extensible , and portable workflows
- Used at Scale in production by 500+ users at Lyft with more than 900k workflow executed a month and more than 30+ million container executions per month
- Fast registration - from local to remote in one second.
- Centralized Inventory of Tasks, Workflows and Executions
- Single Task Execution support - Start executing a task and then convert it to a workflow
- gRPC / REST interface to define and executes tasks and workflows
- Type safe construction of pipelines, each task has an interface which is characterized by its input and outputs. Thus illegal construction of pipelines fails during declaration rather than at runtime
- Types that help in creating machine learning and data processing pipelines like - Blobs (images, arbitrary files), Directories, Schema (columnar structured data), collections, maps etc
- Memoization and Lineage tracking
- Workflows features
- Multiple Schedules for every workflow
- Parallel step execution
- Extensible Backend to add customized plugin experiences (with simplified User experiences)
- Arbitrary container execution
- Branching
- Inline Subworkflows (a workflow can be embeded within one node of the top level workflow)
- Distributed Remote Child workflows (a remote workflow can be triggered and statically verified at compile time)
- Array Tasks (map some function over a large dataset, controlled execution of 1000's of containers)
- Dynamic Workflow creation and execution - with runtime type safety
- Container side plugins with first class support in python
- PreAlpha: Arbitrary flytekit less containers supported (RawContainer)
- Maintain an inventory of tasks and workflows
- Record history of all executions and executions (as long as they follow convention) are completely repeatable
- Multi Cloud support (AWS, GCP and others)
- Extensible core
- Modularized
- Automated notifications to Slack, Email, Pagerduty
- Deep observability
- Multi K8s cluster support
- Comes with many system supported out of the box on K8s like Spark etc.
- Snappy Console
- Python CLI
- Written in Golang and optimized for performance of large running jobs
- Golang CLI - flytectl
- Grafana templates (user/system observability)
- helm chart for Flyte
- Performance optimization
- Flink-K8s
- Containers
- K8s Pods
- AWS Batch Arrays
- K8s Pod arrays
- K8s Spark (native pyspark and java/scala)
- AWS Athena
- Qubole Hive
- Presto Queries
- Distributed Pytorch (K8s Native) - Pytorch Operator
- Sagemaker (builtin algorithms & custom models)
- Distributed Tensorflow (K8s Native) - TFOperator
- Papermill Notebook execution (python and spark)
- Type safe and data checking for Pandas dataframe using Pandera
- Reactive pipelines
- More integrations
Repo | Language | Purpose | Status |
---|---|---|---|
flyte | Kustomize,RST | deployment, documentation, issues | Production-grade |
flyteidl | Protobuf | interface definitions | Production-grade |
flytepropeller | Go | execution engine | Production-grade |
flyteadmin | Go | control plane | Production-grade |
flytekit | Python | python SDK and tools | Production-grade |
flyteconsole | Typescript | admin console | Production-grade |
datacatalog | Go | manage input & output artifacts | Production-grade |
flyteplugins | Go | flyte plugins | Production-grade |
flytestdlib | Go | standard library | Production-grade |
flytesnacks | Python | examples, tips, and tricks | Incubating |
flytekit-java | Java/Scala | Java & scala SDK for authoring Flyte workflows | Incubating |
flytectl | Go | A standalone Flyte CLI | Incomplete |
Repo | Language | Purpose |
---|---|---|
Spark | Go | Apache Spark batch |
Flink | Go | Apache Flink streaming |
Thank you to the community for making Flyte possible.
- @wild-endeavor
- @katrogan
- @EngHabu
- @akhurana001
- @anandswaminathan
- @kanterov
- @honnix
- @jeevb
- @jonathanburns
- @migueltol22
- @varshaparthay
- @pingsutw
- @narape
- @lu4nm3
- @bnsblue
- @RubenBarragan
- @schottra
- @evalsocket
- @matthewphsmith
- @slai
- @derwiki
- @tnsetting
- @jbrambleDC
- @igorvalko
- @chanadian
- @surindersinghp
- @vsbus
- @catalinii
- @kumare3