Pinned Repositories
airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
airflow-essentials
Materials for Airflow training
airflow_project
scaffold of Apache Airflow executing Docker containers
AirlineReservationSystem
AirlineReservationSystem
ambari
Mirror of Apache Ambari
ansible
Ansible
atlas
Apache Atlas
kdd_competition
KDD competition (Knowledge discovery and Data mining)
mapreduce_cwt
simple mapreduce application to perform wordcount with mrunit tests
nypd_open_data
NYPD open data analysis
kbohra's Repositories
kbohra/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
kbohra/airflow-essentials
Materials for Airflow training
kbohra/airflow_project
scaffold of Apache Airflow executing Docker containers
kbohra/atlas
Apache Atlas
kbohra/awesome-apache-airflow
Curated list of resources about Apache Airflow
kbohra/azure-quickstart-templates
Azure Quickstart Templates
kbohra/book-project
Book tracker web app
kbohra/brickhouse
Hive UDF's for the data warehouse
kbohra/coral
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
kbohra/data-pipelines-with-airflow-2nd-ed
Code for the second edition of Data Pipelines with Apache Airflow Book
kbohra/delta
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
kbohra/delta-lake-definitive-guide
kbohra/distsys-class
Class materials for a distributed systems lecture series
kbohra/ghidra
kbohra/grok
Grok open release
kbohra/hive
Mirror of Apache Hive
kbohra/hive-testbench
kbohra/introduction_to_ml_with_python
Notebooks and code for the book "Introduction to Machine Learning with Python"
kbohra/jumbune
Jumbune is an open-source Proactive ML based BigData platform performance accelerator & automated data quality management platform. Commercial offering is available at http://jumbune.com. More details of open source offering are at,
kbohra/jvm-profiler
JVM Profiler Sending Metrics to Kafka, Console Output or Custom Reporter
kbohra/kafka-connect-hdfs
Kafka Connect HDFS connector
kbohra/llama-hub
A library of data loaders for LLMs made by the community -- to be used with GPT Index and/or LangChain
kbohra/NNAnalytics
NameNodeAnalytics is a self-help utility for scouting and maintaining the namespace of an HDFS instance.
kbohra/overwatch
Capture deep metrics on one or all assets within a Databricks workspace
kbohra/presto
The official home of the Presto distributed SQL query engine for big data
kbohra/salt
Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
kbohra/spark
Mirror of Apache Spark
kbohra/statusTracker
Monitor status for cloud services using python based application
kbohra/stocksight
Crowd-sourced stock analyzer and predictor using Elasticsearch, Twitter, News headlines and Python natural language processing and sentiment analysis
kbohra/yt-dlc
media downloader for various sites.