danhphan's Stars
binhnguyennus/awesome-scalability
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
localstack/localstack
💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline
bigskysoftware/htmx
</> htmx - high power tools for HTML
duckdb/duckdb
DuckDB is an analytical in-process SQL database management system
modularml/mojo
The Mojo Programming Language
timescale/timescaledb
An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
dabeaz-course/python-mastery
Advanced Python Mastery (course by @dabeaz)
HypothesisWorks/hypothesis
Hypothesis is a powerful, flexible, and easy to use library for property-based testing.
apache/datafusion
Apache DataFusion SQL Query Engine
evidence-dev/evidence
Business intelligence as code: build fast, interactive data visualizations in SQL and markdown
hudson-and-thames/mlfinlab
MlFinLab helps portfolio managers and traders who want to leverage the power of machine learning by providing reproducible, interpretable, and easy to use tools.
awslabs/deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
sodadata/soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Kludex/mangum
AWS Lambda support for ASGI applications
apache/iceberg-python
Apache PyIceberg
aws-cloudformation/custom-resource-helper
Simplify best practice Custom Resource creation, sending responses to CloudFormation and providing exception, timeout trapping, and detailed configurable logging.
brooklyn-data/dbt_artifacts
A dbt package for modelling dbt metadata. https://brooklyn-data.github.io/dbt_artifacts
sutoiku/puffin
Serverless HTAP cloud data platform powered by Arrow × DuckDB × Iceberg
tabular-io/docker-spark-iceberg
spbail/dag-stack
Data pipeline with dbt, Airflow, Great Expectations
re-data/dbt-re-data
re_data - fix data issues before your users & CEO would discover them 😊
monte-carlo-data/data-downtime-challenge
airbytehq/open-data-stack
Open Data Stack Projects: Examples of End to End Data Engineering Projects
djouallah/Testing_BI_Engine
TPC-H_SF10
dpguthrie/dbtc
danhphan/trusted-data-pipeline
Building 3D Trusted Data Pipelines With Dagster, Dbt, and Duckdb
SamHames/hyperreal
A Python package for interpretive topic modelling
danhphan/workshops
workshops