danielbeach
Data Engineer. Data lover. Data warehouse expert. Python, Rust, SQL, Databricks, Delta Lake is all I need in life.
Iowa
Pinned Repositories
data-engineering-practice
Data Engineering Practice Problems
DataEngineeringProjects
Some example projects for Data Engineers to build, end-to-end.
dataEngineeringTemplate
Template for Data Engineering and Data Pipeline projects
datahobbit
A Rust based data/CSV/Parquet file generator
DuckDBwithAWSLambda
Using DuckDB with AWS Lambda to process Delta Lake data
lakescum
A Python package to help Databricks Unity Catalog users to read and query Delta Lake tables with Polars, DuckDb, or PyArrow.
reepicheep
This is a `Rust` based package to help with the management of complex medicine (pill) management cycles.
sniffer
csv and flat-file sniffer built in Rust.
tinytimmy
A simple and easy to use Data Quality (DQ) tool built with Python.
unitTestPySpark
how to unit test your PySpark code
danielbeach's Repositories
danielbeach/data-engineering-practice
Data Engineering Practice Problems
danielbeach/tinytimmy
A simple and easy to use Data Quality (DQ) tool built with Python.
danielbeach/datahobbit
A Rust based data/CSV/Parquet file generator
danielbeach/sniffer
csv and flat-file sniffer built in Rust.
danielbeach/DataEngineeringProjects
Some example projects for Data Engineers to build, end-to-end.
danielbeach/reepicheep
This is a `Rust` based package to help with the management of complex medicine (pill) management cycles.
danielbeach/DuckDBwithAWSLambda
Using DuckDB with AWS Lambda to process Delta Lake data
danielbeach/lakescum
A Python package to help Databricks Unity Catalog users to read and query Delta Lake tables with Polars, DuckDb, or PyArrow.
danielbeach/PolarsVsPySpark
can Polars crunch 27GBs of data faster than Pyspark?
danielbeach/polars-DeltaLake
Trying out the Dataframe Polars library with Delta Lake ... feat Python.
danielbeach/RustOnApacheAirflow
The ultimate Data Engineering Chadstack. Apache Airflow running Rust. Bring it.
danielbeach/DuckdbAndDeltaLake
Learning how to query remote s3 Delta Lake with DuckDB.
danielbeach/fine-tune-openLLaMA
This repo shows how to fine tune openLLaMA (7b) model on a GPU.
danielbeach/dqxDatabricksDataQuality
trying out the new dqx Data Quality library from Databricks Labs
danielbeach/pyarrow-v-duckdb-v-polars
Compare pyarrow to duckdb to polars for writing data pipelines.
danielbeach/datafusion-sql-cli
Playing around and making ETL tools with Datafusion's CLI SQL tool.
danielbeach/DuckDBwithJSONfiles
processing JSON files in S3 with DuckDB
danielbeach/PolarsDateTimeManipulation
Polars date and time manipulation
danielbeach/puddleglum
Rust based package for answer questions about s3 buckets and files
danielbeach/pythonRustLambda
using Delta Lake Python bindings for delta-rs to write large CSV files to Delta Lakes.
danielbeach/sparklepop
SparklePop is a simple Python package designed to check the free disk space of an AWS RDS instance.
danielbeach/try-smallpond
trying out smallpond that's built on duckdb
danielbeach/DataEngineeringWithFortran
Trying to use Fortran to write a data pipeline
danielbeach/learning-daft
Trying out Daft for Dataframes
danielbeach/scrounger
A `Rust` based Python package as a faster alternative to `vulture` for seeking out and finding dead and unused code in Python repositories.
danielbeach/BoggleyWollah
Python tooling to check the health of your Lake House (Delta Lake)
danielbeach/csvRustIteration
learning how to download, unpack, and read CSV files in Rust.
danielbeach/dbtDatabricks
trying out dbt on Databricks
danielbeach/IowaWaterQuality
Some Python for looking at USGS data around nitrate in Iowa's River.
danielbeach/smallpond
A lightweight data processing framework built on DuckDB and 3FS.