nmukerje

nmukerje's Stars

apache/hudi
Upserts, Deletes And Incremental Processing on Big Data.
Language:Java5.7k 1.2k 3.4k2.5k
ThilinaRajapakse/simpletransformers
Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
Language:Python4.2k 64 1.1k726
utterworks/fast-bert
Super easy library for BERT based NLP models
Language:Python1.9k 40 252341
LucaCanali/sparkMeasure
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
Language:Scala736 32 41148
aws-samples/emr-serverless-samples
Example code for running Spark and Hive jobs on EMR Serverless.
Language:Python160 6 2581
yaojiach/docker-dash
Docker Dash (Plotly)
Language:Python61 6 142
dacort/modern-data-lake-storage-layers
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
Language:Jupyter Notebook48 2 029
awslabs/amazon-athena-cross-account-catalog
🌉 Reference implementation for granting cross-account AWS Glue Data Catalog access from Amazon Athena
Language:Python29 17 319
aws-samples/emr-on-eks-benchmark
Language:Scala27 5 117
ev2900/EMR_Studio_Hudi
Apache Hudi examples designed to be run on AWS Elastic Map Reduce (EMR) via. EMR Studio or EMR Notebooks
Language:Jupyter Notebook3 2 01
dacort/sample-code
Various code bits I run into
Language:Java1 2 0