Pinned Repositories
aardpfark
A library for exporting Spark ML models and pipelines to PFA
algos
Random collection of algorithms
azkaban
Azkaban workflow manager.
cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows on a Hadoop cluster. See https://github.com/Cascading/cascading for the release repository.
cuda-playground
cudf
cuDF - GPU DataFrame Library
elephant-bird
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
gerashegalov.github.io
blog
hadoop
Mirror of Apache Hadoop
rapids-shell
Utility to run/debug Spark RAPIDS in REPL
gerashegalov's Repositories
gerashegalov/rapids-shell
Utility to run/debug Spark RAPIDS in REPL
gerashegalov/aardpfark
A library for exporting Spark ML models and pipelines to PFA
gerashegalov/azkaban
Azkaban workflow manager.
gerashegalov/cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows on a Hadoop cluster. See https://github.com/Cascading/cascading for the release repository.
gerashegalov/cuda-playground
gerashegalov/cudf
cuDF - GPU DataFrame Library
gerashegalov/elephant-bird
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
gerashegalov/gerashegalov.github.io
blog
gerashegalov/hadoop
Mirror of Apache Hadoop
gerashegalov/hdfs-mount
A tool to mount HDFS as a local Linux file system
gerashegalov/Impala
Real-time Query for Hadoop
gerashegalov/Impatient
source examples to support the "Cascading for the Impatient" blog post series
gerashegalov/minihadoop
gerashegalov/parquet-mr
Mirror of Apache Parquet
gerashegalov/presto
Distributed SQL query engine for running interactive analytic queries against big data sources.
gerashegalov/rmm
RAPIDS Memory Manager
gerashegalov/scalaj-http
Simple scala wrapper for HttpURLConnection. OAuth included.
gerashegalov/scalding
A Scala API for Cascading
gerashegalov/schema-registry
Confluent Schema Registry for Kafka
gerashegalov/spark
Mirror of Apache Spark
gerashegalov/spark-rapids
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
gerashegalov/spark-rapids-benchmarks
Spark RAPIDS Benchmarks – benchmark sets and utilities for the RAPIDS Accelerator for Apache Spark
gerashegalov/spark-rapids-examples
A repo for all spark examples using Rapids Accelerator including ETL, ML/DL, etc.
gerashegalov/spark-rapids-jni
RAPIDS Accelerator JNI For Apache Spark
gerashegalov/t-digest
A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means
gerashegalov/takari-local-repository
gerashegalov/testsplits
Standalone tool to benchmark LzoInputFormat getSplits performance
gerashegalov/TransmogrifAI
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Spark with minimal hand tuning
gerashegalov/transmogrifai-helloworld-sbt
gerashegalov/xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow