Pinned Repositories
docker-scripts
docker-scripts for daily dev
HappyHadooping
an automatic tool to deploy Hadoop on EC2
HederaInFloodlight
Implementation of Hedera based on Floodlight
KittenWhisker
debugging performance issues for Spark applications
LoadWeaver
a flexible and lightweight workload generator for Hadoop 1.x
LongTermFairScheduler
LongTermFairScheduler
mininet_stuffs
a fat tree topology developed within mininet env
Self-Learning-Notebooks
RLLearning
xgboost4j-spark-scalability
a benchmark to test scalability of xgboost4j-spark and relevant projects
XGBoostExperiments
repo containing XGBoost-based ML project for various purposes
CodingCat's Repositories
CodingCat/xgboost4j-spark-scalability
a benchmark to test scalability of xgboost4j-spark and relevant projects
CodingCat/Self-Learning-Notebooks
RLLearning
CodingCat/spark
Mirror of Apache Spark
CodingCat/analytics-zoo
Distributed Tensorflow, Keras and BigDL on Apache Spark
CodingCat/arrow-datafusion
Apache Arrow DataFusion and Ballista query engines
CodingCat/BigDL
BigDL: Distributed Deep Learning Library for Apache Spark
CodingCat/celeborn-website
Apache Celeborn Site
CodingCat/cockroachdb-todo-apps
CockroachDB To-Do Apps
CodingCat/cockroachdb_playground
some programs to play around cockroachdb
CodingCat/delta
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
CodingCat/dmlc-core
A common bricks library for building scalable and portable distributed machine learning.
CodingCat/ec2-selector-cli
the cli tool to select ec2 instances based on filters
CodingCat/frameless
Expressive types for Spark.
CodingCat/gazelle_plugin
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
CodingCat/github-markdown-toc
Easy TOC creation for GitHub README.md
CodingCat/gluten
CodingCat/how-query-engines-work
This is the companion repository for the book How Query Engines Work.
CodingCat/iceberg
Apache Iceberg
CodingCat/incubator-celeborn
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
CodingCat/incubator-sedona
A cluster computing framework for processing large-scale geospatial data
CodingCat/incubator-uniffle
Uniffle is a high performance, general purpose Remote Shuffle Service.
CodingCat/morpheus
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
CodingCat/noisepage
Self-Driving Database Management System from Carnegie Mellon University
CodingCat/rabit
Reliable Allreduce and Broadcast Interface for distributed machine learning
CodingCat/spark-lineage
Spark SQL listener to record lineage information
CodingCat/spark-sql-macros
Spark SQL Macros provides a mechanism similar to Spark User-Defined function registration; with the key enhancement being that custom code gets compiled to equivalent Catalyst Expressions at macro define time.
CodingCat/string_encoder
CodingCat/terraform-aws-eks-node-group
Terraform module to provision a fully managed AWS EKS Node Group
CodingCat/velox-intel
A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
CodingCat/xgboost
Large-scale and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, on single node, hadoop yarn and more.