This is a list of official and community submitted examples 🤗. This list is used by Orchest to propose starter examples to users, including information such as the author, the number of stars and forks of the repo. If you would like to be part of this, make a PR!
Make a PR that adds a new entry to the list of examples in this README.md
. This entry must
have the following format (mind the spaces!):
- [title](github url) -<!--o--> description (length limit of 280) <!--o-->- `tag1` `tag2` `tag3` (up to five tags)
Help other users try out your pipeline with one click by adding a badge to the README.md
of your
repository, using:
[![Open in Orchest](https://github.com/orchest/orchest-examples/raw/main/imgs/open_in_orchest.svg)](https://cloud.orchest.io/?import_url=your-repo-url)
Note: you need to replace
your-repo-url
with your repo URL.
An example badge to import our quickstart repo in Orchest:
And thank you 💗!
- Quickstart Pipeline - A quickstart pipeline that trains some simple models in parallel. -
quickstart
machine-learning
training
scikit-learn
- Run PySpark in Orchest - This is a hello world example of how to run (Py)Spark locally in Orchest, it also contains code for connecting to a remote Spark cluster. -
pyspark
spark
cluster
- Using Selenium with Python in Orchest - Scrape webpages with Selenium -
scraping
selenium
- Google Search Console API - A minimal example of how to fetch Google Search Console data through their Python API. -
api
google
- Global Key Value store - A minimal example of how to use a fileystem based global key value store, it uses a simple Python dictionary with SQLite as the backing store. -
utility
- Orchest + dbt - Use dbt inside of Orchest for your materialized views. -
python
dbt
sql
- Image Super-Resolution - Use Image Super-Resolution (ISR) to enhance any image with different methods. -
python
super-resolution
machine-learning
computer-vision
- Coqui TTS - Generate an audio snippet from a text sample and send it as a message on Slack/Discord. -
tts
audio
machine-learning
- Redis and Postgres - An example of how to use Redis and Postgres in an Orchest pipeline. -
postgres
services
- Weaviate + Orchest - Search scraped comments with semantic vector search. -
nlp
streamlit
search
scraping
- Polyglot: Python, Julia and R in one pipeline - An example pipeline showing how to use multiple languages in a same Orchest pipeline. -
environments
julia
r
python
- Web Scraping using Photon - A pipeline that uses the open source Photon library for webscraping. Use this as a starting point for a data ingest pipeline. -
scraping
- Search HN comments with PyWebIO - Use web scraping, Meilisearch and PyWebIO for lightning fast comment search on HN. -
python
pywebio
scraping
- Mixing R and Python in one pipeline - A pipeline showcasing how Python and R can be used within the same pipeline. It also shows how you can call the Orchest SDK from within R. -
r
python
- Calling the Orchest SDK from Julia - An example pipeline that uses PyCall to be able to call the Orchest SDK from within Julia. -
julia
- OAuth QuickBooks example project - Specific example of using the QuickBooks OAuth API in Orchest, but can be used for any OAuth 2.0 authentication flow. -
python
oauth
finance
- Two phase pipeline + Streamlit - This is an example project that demonstrates how to create a pipeline that consists of two phases of execution. -
python
streamlit
- Scraped language classifier - This pipeline classifies random text paragraphs found on websites linked to from random Wikipedia pages. -
python
scraping
streamlit
- Deep_AutoViML Pipeline - Use popular python library, Deep_AutoViML to build multiple deep learning Keras models on any dataset, any size with this pipeline. Data must be in data folder and models are saved in your project folder. -
quickstart
keras
machine-learning
tensorflow
- AutoViz Pipeline - Use popular python library, AutoViz to visualize any dataset, any size with this pipeline. Data must be in data folder and charts are saved in AutoViz_Plots fodler. -
quickstart
auto-visualization
machine-learning
- Orchest + Coiled: spawn cluster and run XGBoost - Spin up a Coiled cluster and run an xgboost train loop on it. Separate Coiled cluster creation step to make it re-usable. -
dask
coiled
xgboost
machine-learning
- Experimenting with PyArrow - Experimenting with PyArrow in Orchest -
arrow
pyarrow
- Out-of-core processing with Vaex - Out-of-core processing with Vaex in Orchest -
vaex
parquet
- Connecting to an external database using SQLAlchemy - Connecting to an external database using SQLAlchemy -
sqlalchemy
postgresql
databases
- Reading +1M Stack Overflow questions with Polars - Reading +1M Stack Overflow questions with Polars -
polars
dataframes
pandas
- Running SQL statements directly in Jupyter using ipython-sql - Running SQL statements directly in Jupyter using ipython-sql -
postgresql
databases
sql
- ELT pipeline in Orchest with meltano and dbt - Creating an ELT pipeline in Orchest that extracts data from PostgreSQL and loads it to BigQuery using meltano and dbt -
elt
pipeline
meltano
dbt
bigquery
- Make the most of your Google Analytics data with Orchest and Meltano - Export the raw events generated by Google Analytics 4 to your data warehouse, using Orchest for orchestration, Meltano for Extraction & Loading (EL), and Metabase for visualization -
elt
pipeline
meltano
google-analytics
- Detect anomalies on your time series data with Orchest and Clarify - Create a pipeline that loads time series data from Clarify, trains an anomaly detection model, writes back the anomalies, and notifies you -
pipeline
clarify
time-series
anomaly-detection
- Drift report with Evidently - Create a drift report using Evidently -
drift
evidently
- Analyzing +4.6M Reddit comments with DuckDB - Analyze +4.6M Reddit comments with DuckDB from Parquet files -
duckdb
sql
arrow