hendrikmakait
OSS engineer focused on scalable data and ML systems working on @dask at @coiled
@coiledHamburg, Germany
Pinned Repositories
dask
Parallel computing with task scheduling
dask-expr
distributed
A distributed task scheduler for Dask
ablog
ABlog for blogging with Sphinx
aim3-eda
AIM3-SparkSQL-Demo
Spark SQL demo for AIM-3 Scalable Data Science at TU Berlin
dask-opentelemetry
Instrument Dask with OpenTelemetry
pydata-berlin-2023
Slides and resources for my talk on "Observability for Distributed Computing with Dask"
hendrikmakait's Repositories
hendrikmakait/distributed
A distributed task scheduler for Dask
hendrikmakait/ablog
ABlog for blogging with Sphinx
hendrikmakait/arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
hendrikmakait/arrow-rs
Official Rust implementation of Apache Arrow
hendrikmakait/blog_os
Implementation of the Writing an OS in Rust series at os.phil-opp.com.
hendrikmakait/cloudpickle
Extended pickling support for Python objects
hendrikmakait/dask
Parallel computing with task scheduling
hendrikmakait/dask-cloudprovider
Cloud provider cluster managers for Dask. Supports AWS, Google Cloud Azure and more...
hendrikmakait/dask-expr
hendrikmakait/dask-jobqueue
Deploy Dask on job schedulers like PBS, SLURM, and SGE
hendrikmakait/dask-maintenance
Repository for tooling around Dask Maintenance
hendrikmakait/dask-ml
Scalable Machine Learning with Dask
hendrikmakait/dask-pyspy
Profile the dask distributed scheduler with py-spy and viztracer
hendrikmakait/dask-sql
Distributed SQL Engine in Python using Dask
hendrikmakait/dask-tpcdi
A Dask-based ETL showcase based on the TPC-DI benchmark
hendrikmakait/datafusion
Apache Arrow DataFusion SQL Query Engine
hendrikmakait/dotfiles
hendrikmakait/filesystem_spec
A specification that python filesystems should adhere to.
hendrikmakait/governance
The governance process and model for Dask
hendrikmakait/hendrikmakait.github.io
Professional Website
hendrikmakait/iceberg-python
Apache PyIceberg
hendrikmakait/icechunk
Open-source, cloud-native transactional tensor storage engine
hendrikmakait/p2p-workbench
Workbench for performance optimizing Dask's P2P shuffling
hendrikmakait/polars
Fast multi-threaded, hybrid-out-of-core query engine focussing on DataFrame front-ends
hendrikmakait/py-spy
Sampling profiler for Python programs
hendrikmakait/python-tblib
Serialization library for Exceptions and Tracebacks.
hendrikmakait/qmk_userspace
Userspace for the open-source QMK keyboard firmware.
hendrikmakait/toolz
A functional standard library for Python.
hendrikmakait/xarray
N-D labeled arrays and datasets in Python
hendrikmakait/xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow