Pinned Repositories
bingo
bingo is a toolkit for building microservices . It is meant to be used along with https://goji.io
blitz
Blitz is an http load testing and benchmarking utility.
bad-data-guide
An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.
databricks-aws-monitoring
Monitoring Databricks with AWS CloudWatch
docker-presto-cluster
Multiple node presto cluster on docker container
e2chk
geolatlong
geolatlong provides latitude, longitude to city mapping
referer-parser
Referrer parsing library in scala
scala-primer
Scala Workshop for Spark
spark-workshop
saj1th's Repositories
saj1th/databricks-aws-monitoring
Monitoring Databricks with AWS CloudWatch
saj1th/docker-presto-cluster
Multiple node presto cluster on docker container
saj1th/e2chk
saj1th/ad_attrib
saj1th/ad_attrib_repo
saj1th/benchbase
Multi-DBMS SQL Benchmarking Framework via JDBC
saj1th/coral
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
saj1th/databricks-cli
Command Line Interface for Databricks
saj1th/Databricks-GPU-Serving-Examples
Databricks GPU Model Serving Example Scripts
saj1th/databricks-maven-plugin
saj1th/databricks_rag_demo
saj1th/db-migration
Databricks Migration Tools
saj1th/delta
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
saj1th/docker-spark-iceberg
saj1th/flog
:tophat: A fake log generator for common log formats
saj1th/hudi
Upserts, Deletes And Incremental Processing on Big Data.
saj1th/json-data-generator
A robust, generic, streaming random json data generator for your data
saj1th/kitchensink
saj1th/lhbench
Lakehouse storage system benchmark
saj1th/lhbench-notebooks
saj1th/log4j-json-layout
Log4J Layout to Format Logs into Logstash Json Format
saj1th/lst-bench
LST-Bench is a framework that allows users to run benchmarks specifically designed for evaluating Log-Structured Tables (LSTs) such as Delta Lake, Apache Hudi, and Apache Iceberg.
saj1th/saj1th.github.io
saj1th/spark-alchemy
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
saj1th/spark-sql-dsv2-extension
A sql extension build on spark3 datasource v2 api, ex: hive v2 catalog support amoung multi clusters
saj1th/spark-statsd
Factor out StatsdSink from spark
saj1th/sparkMeasure
This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
saj1th/terraform-aws-presto
Terraform module to create Presto cluster
saj1th/usql
Universal command-line interface for SQL databases
saj1th/vmware-go-kcl
KCL Implementation in Go lang by VMware