nmukerje's Stars
apache/hudi
Upserts, Deletes And Incremental Processing on Big Data.
ThilinaRajapakse/simpletransformers
Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
utterworks/fast-bert
Super easy library for BERT based NLP models
LucaCanali/sparkMeasure
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
aws-samples/emr-serverless-samples
Example code for running Spark and Hive jobs on EMR Serverless.
yaojiach/docker-dash
Docker Dash (Plotly)
dacort/modern-data-lake-storage-layers
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
awslabs/amazon-athena-cross-account-catalog
🌉 Reference implementation for granting cross-account AWS Glue Data Catalog access from Amazon Athena
aws-samples/emr-on-eks-benchmark
ev2900/EMR_Studio_Hudi
Apache Hudi examples designed to be run on AWS Elastic Map Reduce (EMR) via. EMR Studio or EMR Notebooks
dacort/sample-code
Various code bits I run into