Pinned Repositories
satisfaction
The Next Generation Hadoop Scheduler
ambrose
A platform for visualization and real-time monitoring of data workflows
brickhouse
Hive UDF's for the data warehouse
experimental_bigdata-interop
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
reair
ReAir is a collection of easy-to-use tools for replicating tables and partitions between Hive data warehouses.
satisfaction
The Next Generation Hadoop Scheduler
sbt-satisfy
SBT Plugin for Satisfaction
jeromebanks's Repositories
jeromebanks/brickhouse
Hive UDF's for the data warehouse
jeromebanks/experimental_bigdata-interop
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
jeromebanks/satisfaction
The Next Generation Hadoop Scheduler
jeromebanks/reair
ReAir is a collection of easy-to-use tools for replicating tables and partitions between Hive data warehouses.
jeromebanks/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
jeromebanks/artemis-corpus-test-framework
A test framework for working with test corpora for unit tests.
jeromebanks/aws-glue-data-catalog-client-for-apache-hive-metastore
The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog as an external Hive Metastore. It serves as a reference implementation for building a Hive Metastore-compatible client that connects to the AWS Glue Data Catalog. It may be ported to other Hive Metastore-compatible platforms such as other Hadoop and Apache Spark distributions
jeromebanks/boilerpipe
Work in progress transmit from Google Code
jeromebanks/Chat-with-Github-Repo
This repository contains two Python scripts that demonstrate how to create a chatbot using Streamlit, OpenAI GPT-3.5-turbo, and Activeloop's Deep Lake.
jeromebanks/classutil
Scala-friendly, fast class-finder library (using ASM under the covers)
jeromebanks/docker-spark-k8s-aws
Docker image for running Spark 3 on Kubernetes on AWS
jeromebanks/document-api-python
Create and modify Tableau workbook and datasource files
jeromebanks/experimental_spark-bigquery
Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
jeromebanks/experimental_spark-bigquery-1
Google BigQuery support for Spark, SQL, and DataFrames
jeromebanks/generalized-kmeans-clustering
This project generalizes the Spark MLLIB Batch and Streaming K-Means clusterers in every practical way.
jeromebanks/incubator-hivemall
Mirror of Apache Hivemall (incubating)
jeromebanks/influxdb-java
Java client for InfluxDB
jeromebanks/js-murmur3-128
A JavaScript implementation of the 128bit variant of Murmur3 (that is compatible with Guava)
jeromebanks/nutch
Apache Nutch
jeromebanks/okhttp
An HTTP+HTTP/2 client for Android and Java applications.
jeromebanks/reactive-kafka
Reactive Streams API for Apache Kafka
jeromebanks/redshift-auto-schema
Redshift Auto Schema is a Python library that takes a delimited flat file or parquet file as input, parses it, and provides a variety of functions that allow for the creation and validation of tables within Amazon Redshift.
jeromebanks/sbt-google-cloud-storage
A SBT resolver and publisher for Google Cloud Storage
jeromebanks/scala.rx
An experimental library for Functional Reactive Programming in Scala
jeromebanks/spark
jeromebanks/spark-glue
Spark releases with AWS Glue support
jeromebanks/spark-on-k8s-operator
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
jeromebanks/spark-on-kubernetes-docker
Spark on Kubernetes infrastructure Docker images repo
jeromebanks/spark-on-kubernetes-helm
Spark on Kubernetes infrastructure Helm charts repo
jeromebanks/terrapin
Serving system for batch generated data sets