/dagster

A data orchestrator for machine learning, analytics, and ETL.

Primary LanguagePythonApache License 2.0Apache-2.0

dagster logo

Dagster is an orchestration platform for the development, production, and observation of data assets.

  • Develop and test locally, then deploy anywhere: With Dagster, the same computations can run in-process against your local file system or on a distributed work queue against your production data lake. Choose to locally develop on your laptop, deploy on-premise, or run in any cloud.
  • Model the data produced and consumed: In your orchestration graph, Dagster models data dependencies and handles how data passes between steps. Gradual typing on inputs and outputs catches bugs early.
  • Link data to computations: Dagster’s Asset Catalog tracks the data sets and ML models produced by your jobs. Understand how they were generated and trace issues when asset declarations do not match their materializations in storage.
  • Build a self-service data platform: Dagster helps platform teams build systems for data practitioners. Jobs are built from shared, reusable, configurable data processing components. Dagit, Dagster’s web interface, lets anyone inspect these objects and discover how to use them.
  • Declare and isolate dependencies: Dagster’s server model enables you to isolate codebases. Problems in one job will not bring down the system or other jobs. Each job can have its own package dependencies and Python version.
  • Debug jobs from a rich interface: Dagit includes expansive facilities for understanding the jobs it orchestrates. When inspecting a run of your job, you can query over logs, discover the most time-consuming tasks via a Gantt chart, re-execute subsets of steps, and more.

Installation

Dagster is available on PyPI and officially supports Python 3.6+.

pip install dagster dagit

This installs two modules:

  • Dagster: The core programming model.
  • Dagit: The web interface for developing and operating Dagster jobs. It includes a DAG browser, a type-aware interface to launch runs, a live view for in-progress runs, a catalog to view your data assets, and more.

For a quick overview, check out our Getting Started page.

Documentation

You can find the Dagster documentation on the website.

We've divided up the documentation into several sections:

Community

Connect with thousands of other data practitioners building with Dagster. Share knowledge, get help, and contribute to the open-source project. To see featured material and upcoming events, check out our Dagster Community page.

Join our community here:

Contributing

For details on contributing or running the project for development, check out our contributing guide.

License

Dagster is Apache 2.0 licensed.