holdenk
Holden Karau is trans Canadian, and open source contributor. She is a Spark committer co-author of Learning Spark, High Performance Spark and Kubeflow for ML.
Open Source Big Data DevSan Francisco, CA, USA
Pinned Repositories
chef-cookbook-spark
A chef cookbook for deploying spark
elasticsearchspark
Elastic Search on Spark
fastdataprocessingwithsparkexamples
Examples for Fast Data Processing with Spark
learning-spark-examples
Examples for learning spark
spark-flowchart
Flowchart for debugging Spark applications
spark-structured-streaming-ml
Structured Streaming Machine Learning example with Spark 2.0
spark-testing-base
Base classes to use when writing tests with Spark
spark-upgrade
Magic to help Spark pipelines upgrade
spark-validator
A library you can include in your Spark job to validate the counters and perform operations on success. Goal is scala/java/python support.
sparkProjectTemplate.g8
Template for Spark Projects
holdenk's Repositories
holdenk/spark-testing-base
Base classes to use when writing tests with Spark
holdenk/spark-flowchart
Flowchart for debugging Spark applications
holdenk/sparkProjectTemplate.g8
Template for Spark Projects
holdenk/spark-upgrade
Magic to help Spark pipelines upgrade
holdenk/high-performance-spark-examples
Examples for High Performance Spark
holdenk/distributedcomputing4kids
distributedcomputing4kids
holdenk/spark
Mirror of Apache Spark
holdenk/resume
latex resume
holdenk/spark-misc-utils
Misc Utils for Spark
holdenk/colo-scripts
holdenk/explore-dolly
Exploring what we can do with Databrick's Dolly (and similar)
holdenk/mydotfiles
My dotfiles. You probably don't care about this.
holdenk/sparklingpinkpandas
Website for Sparkling Pink Pandas (queer, trans focused scooter club)
holdenk/data-validator
A tool to validate data, built around Apache Spark.
holdenk/gluten
Gluten: Plugin to Double SparkSQL's Performance
holdenk/ray
A fast and simple framework for building and running distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
holdenk/spark-connect-rs
Apache Spark Connect Client for Rust
holdenk/arrow-datafusion-comet
Apache Arrow DataFusion Comet Spark Accelerator
holdenk/bitsandbytes
8-bit CUDA functions for PyTorch
holdenk/django-rest-framework-braces
Collection of utilities for working with django rest framework (DRF)
holdenk/dolly
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
holdenk/lit-parrot
Implementation of Falcon, StableLM, Pythia, INCITE language models based on nanoGPT. Supports flash attention, LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
holdenk/looking-glass
Easy to deploy Looking Glass
holdenk/nivenly-website
holdenk/obico-server
Obico is a community-built, open-source smart 3D printing platform used by makers, enthusiasts, and tinkerers around the world.
holdenk/onetable
OneTable is an omni-directional converter for table formats that facilitates interoperability across data processing systems and query engines.
holdenk/spark-expectations
A Python Library to support running data quality rules while the spark job is running⚡
holdenk/sparklens
Qubole Sparklens tool for performance tuning Apache Spark
holdenk/uszipcode-project
USA zipcode programmable database, includes up-to-date census and geometry information.
holdenk/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs