Website — Main repo — Docs

Orchest examples

This is a list of official and community submitted examples 🤗. This list is used by Orchest to propose starter examples to users, including information such as the author, the number of stars and forks of the repo. If you would like to be part of this, make a PR!

Contributing

Make a PR that adds a new entry to the list of examples in this README.md. This entry must have the following format (mind the spaces!):

- [title](github url) - description (length limit of 280) - `tag1` `tag2` `tag3` (up to five tags)

Help other users try out your pipeline with one click by adding a badge to the README.md of your repository, using:

[![Open in Orchest](https://github.com/orchest/orchest-examples/raw/main/imgs/open_in_orchest.svg)](https://cloud.orchest.io/?import_url=your-repo-url)

Note: you need to replace your-repo-url with your repo URL.

An example badge to import our quickstart repo in Orchest:

And thank you 💗!

Examples

Quickstart Pipeline - A quickstart pipeline that trains some simple models in parallel. - quickstart machine-learning training scikit-learn
Run PySpark in Orchest - This is a hello world example of how to run (Py)Spark locally in Orchest, it also contains code for connecting to a remote Spark cluster. - pyspark spark cluster
Using Selenium with Python in Orchest - Scrape webpages with Selenium - scraping selenium
Google Search Console API - A minimal example of how to fetch Google Search Console data through their Python API. - api google
Global Key Value store - A minimal example of how to use a fileystem based global key value store, it uses a simple Python dictionary with SQLite as the backing store. - utility
Orchest + dbt - Use dbt inside of Orchest for your materialized views. - python dbt sql
Image Super-Resolution - Use Image Super-Resolution (ISR) to enhance any image with different methods. - python super-resolution machine-learning computer-vision
Coqui TTS - Generate an audio snippet from a text sample and send it as a message on Slack/Discord. - tts audio machine-learning
Redis and Postgres - An example of how to use Redis and Postgres in an Orchest pipeline. - postgres services
Weaviate + Orchest - Search scraped comments with semantic vector search. - nlp streamlit search scraping
Polyglot: Python, Julia and R in one pipeline - An example pipeline showing how to use multiple languages in a same Orchest pipeline. - environments julia r python
Web Scraping using Photon - A pipeline that uses the open source Photon library for webscraping. Use this as a starting point for a data ingest pipeline. - scraping
Search HN comments with PyWebIO - Use web scraping, Meilisearch and PyWebIO for lightning fast comment search on HN. - python pywebio scraping
Mixing R and Python in one pipeline - A pipeline showcasing how Python and R can be used within the same pipeline. It also shows how you can call the Orchest SDK from within R. - r python
Calling the Orchest SDK from Julia - An example pipeline that uses PyCall to be able to call the Orchest SDK from within Julia. - julia
OAuth QuickBooks example project - Specific example of using the QuickBooks OAuth API in Orchest, but can be used for any OAuth 2.0 authentication flow. - python oauth finance
Two phase pipeline + Streamlit - This is an example project that demonstrates how to create a pipeline that consists of two phases of execution. - python streamlit
Scraped language classifier - This pipeline classifies random text paragraphs found on websites linked to from random Wikipedia pages. - python scraping streamlit
Deep_AutoViML Pipeline - Use popular python library, Deep_AutoViML to build multiple deep learning Keras models on any dataset, any size with this pipeline. Data must be in data folder and models are saved in your project folder. - quickstart keras machine-learning tensorflow
AutoViz Pipeline - Use popular python library, AutoViz to visualize any dataset, any size with this pipeline. Data must be in data folder and charts are saved in AutoViz_Plots fodler. - quickstart auto-visualization machine-learning
Orchest + Coiled: spawn cluster and run XGBoost - Spin up a Coiled cluster and run an xgboost train loop on it. Separate Coiled cluster creation step to make it re-usable. - dask coiled xgboost machine-learning
Experimenting with PyArrow - Experimenting with PyArrow in Orchest - arrow pyarrow
Out-of-core processing with Vaex - Out-of-core processing with Vaex in Orchest - vaex parquet
Connecting to an external database using SQLAlchemy - Connecting to an external database using SQLAlchemy - sqlalchemy postgresql databases
Reading +1M Stack Overflow questions with Polars - Reading +1M Stack Overflow questions with Polars - polars dataframes pandas
Running SQL statements directly in Jupyter using ipython-sql - Running SQL statements directly in Jupyter using ipython-sql - postgresql databases sql
ELT pipeline in Orchest with meltano and dbt - Creating an ELT pipeline in Orchest that extracts data from PostgreSQL and loads it to BigQuery using meltano and dbt - elt pipeline meltano dbt bigquery
Make the most of your Google Analytics data with Orchest and Meltano - Export the raw events generated by Google Analytics 4 to your data warehouse, using Orchest for orchestration, Meltano for Extraction & Loading (EL), and Metabase for visualization - elt pipeline meltano google-analytics
Detect anomalies on your time series data with Orchest and Clarify - Create a pipeline that loads time series data from Clarify, trains an anomaly detection model, writes back the anomalies, and notifies you - pipeline clarify time-series anomaly-detection
Drift report with Evidently - Create a drift report using Evidently - drift evidently
Analyzing +4.6M Reddit comments with DuckDB - Analyze +4.6M Reddit comments with DuckDB from Parquet files - duckdb sql arrow