/flyte

Kubernetes-native workflow automation platform for complex, mission-critical data and ML processes at scale. It has been battle-tested at Lyft, Spotify, Freenome, and others and is truly open-source.

Primary LanguagePythonApache License 2.0Apache-2.0

Flyte and LF AI & Data Logo

Flyte

Flyte is a workflow automation platform for complex, mission-critical data and ML processes at scale

Current Release Sandbox Build End-to-End Tests License Commit Activity Commits since Last Release GitHub Milestones Completed GitHub Next Milestone Percentage Docs Twitter Follow Flyte Helm Chart Join Flyte Slack

💥 Introduction

Flyte is a structured programming and distributed processing platform that enables highly concurrent, scalable, and maintainable workflows for Machine Learning and Data Processing. It is a fabric that connects disparate computation backends using a type-safe data dependency graph. It records all changes to a pipeline, making it possible to rewind time. It also stores a history of all executions and provides an intuitive UI, CLI, and REST/gRPC API to interact with the computation.

Flyte is more than a workflow engine -- it uses workflow as a core concept and task (a single unit of execution) as a top-level concept. Multiple tasks arranged in a data producer-consumer order creates a workflow.

Workflows and Tasks can be written in any language, with out-of-the-box support for Python, Java and Scala.

⏳ Five Reasons to Use Flyte

  • Kubernetes-Native Workflow Automation Platform
  • Ergonomic SDK's in Python, Java & Scala
  • Versioned & Auditable
  • Reproducible Pipelines
  • Strong Data Typing

🚀 Quick Start

With Docker installed and Flytectl installed, run the following command:

  flytectl sandbox start

This creates a local Flyte sandbox. Once the sandbox is ready, you should see the following message: Flyte is ready! Flyte UI is available at http://localhost:30081/console.

Visit http://localhost:30081/console to view the Flyte dashboard.

Here's a quick visual tour of the console.

Flyte console Example

To dig deeper into Flyte, refer to the Documentation.

⭐️ Current Deployments & Contributors

NOTE Please maintain an alphabetical order in the following list

🔥 Features

  • Used at Scale in production by 500+ users at Lyft with more than 1 million executions and 40+ million container executions per month
  • A data-aware platform
  • Enables collaboration across your organization by:
    • Executing distributed data pipelines/workflows
    • Reusing tasks across projects, users, and workflows
    • Making it easy to stitch together workflows from different teams and domain experts
    • Backtracing to a specified workflow
    • Comparing results of training workflows over time and across pipelines
    • Sharing workflows and tasks across your teams
    • Simplifying the complexity of multi-step, multi-owner workflows
  • Quick registration -- start locally and scale to the cloud instantly
  • Centralized Inventory constituting Tasks, Workflows, and Executions
  • gRPC / REST interface to define and execute tasks and workflows
  • Type safe construction of pipelines -- each task has an interface that is characterized by its input and output, so illegal construction of pipelines fails during declaration rather than at runtime
  • Supports multiple data types for machine learning and data processing pipelines, such as Blobs (images, arbitrary files), Directories, Schema (columnar structured data), collections, maps, etc.
  • Memoization and Lineage tracking
  • Provides logging and observability
  • Workflow features:
    • Start with one task, convert to a pipeline, attach multiple schedules, trigger using a programmatic API, or on-demand
    • Parallel step execution
    • Extensible backend to add customized plugin experience (with simplified user experience)
    • Branching
  • Inline subworkflows (a workflow can be embedded within one node of the top-level workflow)
  • Distributed remote child workflows (a remote workflow can be triggered and statically verified at compile time)
  • Array Tasks (map a function over a large dataset -- ensures controlled execution of thousands of containers)
  • Dynamic workflow creation and execution with runtime type safety
  • Container side plugins with first-class support in Python
  • PreAlpha: Arbitrary flytekit-less containers supported (RawContainer)
  • Guaranteed reproducibility of pipelines via:
    • Versioned data, code, and models
    • Automatically tracked executions
    • Declarative pipelines
  • Multi-cloud support (AWS, GCP, and others)
  • Extensible core, modularized, and deep observability
  • No single point of failure and is resilient by design
  • Automated notifications to Slack, Email, and Pagerduty
  • Multi K8s cluster support
  • Out of the box support to run Spark jobs on K8s, Hive queries, etc.
  • Snappy Console
  • Python CLI and Golang CLI (flytectl)
  • Written in Golang and optimized for large running jobs' performance
  • Grafana templates (user/system observability)

In Progress

  • Demos; Distributed Pytorch, feature engineering, etc.
  • Integrations; Great Expectations, Feast
  • Least-privilege Minimal Helm Chart
  • Relaunch execution in recover mode
  • Documentation as code

🔌 Available Plugins

📦 Component Repos

Repo Language Purpose Status
flyte Kustomize,RST deployment, documentation, issues Production-grade
flyteidl Protobuf interface definitions Production-grade
flytepropeller Go execution engine Production-grade
flyteadmin Go control plane Production-grade
flytekit Python python SDK and tools Production-grade
flyteconsole Typescript admin console Production-grade
datacatalog Go manage input & output artifacts Production-grade
flyteplugins Go flyte plugins Production-grade
flytestdlib Go standard library Production-grade
flytesnacks Python examples, tips, and tricks Incubating
flytekit-java Java/Scala Java & scala SDK for authoring Flyte workflows Incubating
flytectl Go A standalone Flyte CLI Incomplete

🔩 Production K8s Operators

Repo Language Purpose
Spark Go Apache Spark batch
Flink Go Apache Flink streaming

🤝 Community & Resources

Here are some resources to help you learn more about Flyte.

Communication Channels

Biweekly Community Sync

  • 📣 Flyte OSS Community Sync Every other Tuesday, 9am-10am PDT. Check out the calendar and register to stay up-to-date with our meeting times. Or simply join us on Zoom.
  • Upcoming meeting agenda, previous meeting notes, and a backlog of topics are captured in this document.
  • If you'd like to revisit any previous community sync meetings, you can access the video recordings on Flyte's YouTube channel.

Conference Talks

  • Kubecon 2019 - Flyte: Cloud Native Machine Learning and Data Processing Platform video | deck
  • Kubecon 2019 - Running LargeScale Stateful workloads on Kubernetes at Lyft video
  • re:invent 2019 - Implementing ML workflows with Kubernetes and Amazon Sagemaker video
  • Cloud-native machine learning at Lyft with AWS Batch and Amazon EKS video
  • OSS + ELC NA 2020 splash
  • Datacouncil video | splash
  • FB AI@Scale Making MLOps & DataOps a reality
  • GAIC 2020
  • OSPOCon 2021 Catch a variety of Flyte talks - final schedule and topics to be released soon.

Blog Posts

Podcasts

💖 All Contributors

A big thank you to the community for making Flyte possible!

953358370901257597118271592984394315889218408237289656810830562151852478108056277771731688870950323568122852388064539362132617421377798312914271654870211815175193752412816689379360156239450606505170988160909768888115533133944967458656289846792724868813302335026554382072083458779882002094358781910869815163899131688113605296533941550871399455181059180421934122194053939659677475818337807543342659609986322624543338607005765489666477716778231381038735895169847481251054308533112692566122863388176391730918757967031177840712450632496993336597780085753828269537098473503613880715877000119733683183633014023015671668432536449075153592004012473994910430635405480100478913358814830700691617226506810149921891175392747594630478624402505869111421034518416461847253876014008978163346019229049754882358032810272071568889937967253911731499686810430511174730697033573204771284190262653923275593668531131078931300022132422546729420280470199429103263318519037114784118667547488631813081390277139198238800017703926178510511193187343420242167241733087241941910145045078542033884147090147964002533536441242107780986519378