/flyte

Kubernetes-native workflow automation platform for complex, mission-critical data and ML processes at scale. It has been battle-tested at Lyft, Spotify, Freenome, and others and is truly open-source.

Primary LanguagePythonApache License 2.0Apache-2.0

Flyte and LF AI & Data Logo

Flyte

Flyte is a workflow automation platform for complex, mission-critical data and ML processes at scale

Current Release Sandbox Build End-to-End Tests License Commit Activity Commits since Last Release GitHub Milestones Completed GitHub Next Milestone Percentage Docs Twitter Follow Flyte Helm Chart Join Flyte Slack

💥 Introduction

Flyte is a structured programming and distributed processing platform that enables highly concurrent, scalable, and maintainable workflows for Machine Learning and Data Processing. It is a fabric that connects disparate computation backends using a type-safe data dependency graph. It records all changes to a pipeline, making it possible to rewind time. It also stores a history of all executions and provides an intuitive UI, CLI, and REST/gRPC API to interact with the computation.

Flyte is more than a workflow engine -- it uses workflow as a core concept and task (a single unit of execution) as a top-level concept. Multiple tasks arranged in a data producer-consumer order creates a workflow.

Workflows and Tasks can be written in any language, with out-of-the-box support for Python, Java and Scala.

⏳ Five Reasons to Use Flyte

  • Kubernetes-Native Workflow Automation Platform
  • Ergonomic SDK's in Python, Java & Scala
  • Versioned & Auditable
  • Reproducible Pipelines
  • Strong Data Typing

🚀 Quick Start

With Docker installed and Flytectl installed, run the following command:

  flytectl sandbox start

This creates a local Flyte sandbox. Once the sandbox is ready, you should see the following message: Flyte is ready! Flyte UI is available at http://localhost:30081/console.

Visit http://localhost:30081/console to view the Flyte dashboard.

Here's a quick visual tour of the console.

Flyte console Example

To dig deeper into Flyte, refer to the Documentation.

⭐️ Current Deployments & Contributors

NOTE Please maintain an alphabetical order in the following list

🔥 Features

  • Used at Scale in production by 500+ users at Lyft with more than 1 million executions and 40+ million container executions per month
  • A data-aware platform
  • Enables collaboration across your organization by:
    • Executing distributed data pipelines/workflows
    • Reusing tasks across projects, users, and workflows
    • Making it easy to stitch together workflows from different teams and domain experts
    • Backtracing to a specified workflow
    • Comparing results of training workflows over time and across pipelines
    • Sharing workflows and tasks across your teams
    • Simplifying the complexity of multi-step, multi-owner workflows
  • Quick registration -- start locally and scale to the cloud instantly
  • Centralized Inventory constituting Tasks, Workflows, and Executions
  • gRPC / REST interface to define and execute tasks and workflows
  • Type safe construction of pipelines -- each task has an interface that is characterized by its input and output, so illegal construction of pipelines fails during declaration rather than at runtime
  • Supports multiple data types for machine learning and data processing pipelines, such as Blobs (images, arbitrary files), Directories, Schema (columnar structured data), collections, maps, etc.
  • Memoization and Lineage tracking
  • Provides logging and observability
  • Workflow features:
    • Start with one task, convert to a pipeline, attach multiple schedules, trigger using a programmatic API, or on-demand
    • Parallel step execution
    • Extensible backend to add customized plugin experience (with simplified user experience)
    • Branching
  • Inline subworkflows (a workflow can be embedded within one node of the top-level workflow)
  • Distributed remote child workflows (a remote workflow can be triggered and statically verified at compile time)
  • Array Tasks (map a function over a large dataset -- ensures controlled execution of thousands of containers)
  • Dynamic workflow creation and execution with runtime type safety
  • Container side plugins with first-class support in Python
  • PreAlpha: Arbitrary flytekit-less containers supported (RawContainer)
  • Guaranteed reproducibility of pipelines via:
    • Versioned data, code, and models
    • Automatically tracked executions
    • Declarative pipelines
  • Multi-cloud support (AWS, GCP, and others)
  • Extensible core, modularized, and deep observability
  • No single point of failure and is resilient by design
  • Automated notifications to Slack, Email, and Pagerduty
  • Multi K8s cluster support
  • Out of the box support to run Spark jobs on K8s, Hive queries, etc.
  • Snappy Console
  • Python CLI and Golang CLI (flytectl)
  • Written in Golang and optimized for large running jobs' performance
  • Grafana templates (user/system observability)

In Progress

  • Demos; Distributed Pytorch, feature engineering, etc.
  • Integrations; Great Expectations, Feast
  • Least-privilege Minimal Helm Chart
  • Relaunch execution in recover mode
  • Documentation as code

🔌 Available Plugins

📦 Component Repos

Repo Language Purpose Status
flyte Kustomize,RST deployment, documentation, issues Production-grade
flyteidl Protobuf interface definitions Production-grade
flytepropeller Go execution engine Production-grade
flyteadmin Go control plane Production-grade
flytekit Python python SDK and tools Production-grade
flyteconsole Typescript admin console Production-grade
datacatalog Go manage input & output artifacts Production-grade
flyteplugins Go flyte plugins Production-grade
flytestdlib Go standard library Production-grade
flytesnacks Python examples, tips, and tricks Incubating
flytekit-java Java/Scala Java & scala SDK for authoring Flyte workflows Incubating
flytectl Go A standalone Flyte CLI Incomplete

🔩 Production K8s Operators

Repo Language Purpose
Spark Go Apache Spark batch
Flink Go Apache Flink streaming

🤝 Community & Resources

Here are some resources to help you learn more about Flyte.

Communication Channels

Biweekly Community Sync

  • 📣 Flyte OSS Community Sync Every other Tuesday, 9am-10am PDT. Check out the calendar and register to stay up-to-date with our meeting times. Or simply join us on Zoom.
  • Upcoming meeting agenda, previous meeting notes, and a backlog of topics are captured in this document.
  • If you'd like to revisit any previous community sync meetings, you can access the video recordings on Flyte's YouTube channel.

Conference Talks

  • Kubecon 2019 - Flyte: Cloud Native Machine Learning and Data Processing Platform video | deck
  • Kubecon 2019 - Running LargeScale Stateful workloads on Kubernetes at Lyft video
  • re:invent 2019 - Implementing ML workflows with Kubernetes and Amazon Sagemaker video
  • Cloud-native machine learning at Lyft with AWS Batch and Amazon EKS video
  • OSS + ELC NA 2020 splash
  • Datacouncil video | splash
  • FB AI@Scale Making MLOps & DataOps a reality
  • GAIC 2020
  • OSPOCon 2021 Catch a variety of Flyte talks - final schedule and topics to be released soon.

Blog Posts

Podcasts

💖 All Contributors

A big thank you to the community for making Flyte possible!

95335837090125759711827159298439431588921840823728965681083056215185247810805627777173168887095032356812285265339438806453936213261742137779831291427165487021181517519375241281668937936015623945060650517098816090976888811553313394496745865628984679272486881330233502655438207208345877988200209435878191086981516389913168811360529155087139945518105918042193412219405393965967747581833780754334265960998632262454333860700576548966647771677823138103873589516984748125105430853311269256612286338817639173091875796703117784071245063249699333659778008575382826953709847350361388071587700011973368318363301402301567166843253644907515359200401247399491043063540548010047891335881483070069161722650681014992189117539274759463047862440250586911142103451841646184725387601400897816334603125543430335921922904975488235803281027207156888993796725391173149968681043051117473044368997697033573204771284190262653923275593668531131078931300022132422546729420280470199429103263318519037114784118667547488631813081390277139198238800017703926178510511193187343420242167241733087241941910145045078542033884147090147964002533536441242107780986519378