wenxuanguan's Stars
EbookFoundation/free-programming-books
:books: Freely available programming books
cncf/landscape
🌄 The Cloud Native Interactive Landscape filters and sorts hundreds of projects and products, and shows details including GitHub stars, funding, first and last commits, contributor counts and headquarters location.
jigish/slate
A window management application (replacement for Divvy/SizeUp/ShiftIt)
JerryLead/SparkInternals
Notes talking about the design and implementation of Apache Spark
facebookincubator/velox
A composable and fully extensible C++ execution engine library for data management systems.
tomwhite/hadoop-book
Example source code accompanying O'Reilly's "Hadoop: The Definitive Guide" by Tom White
databricks/scala-style-guide
Databricks Scala Coding Style Guide
apache/logging-flume
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-like data
awesome-spark/awesome-spark
A curated list of awesome Apache Spark packages and resources.
linkedin/dr-elephant
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
sbt/sbt-dependency-graph
sbt plugin to create a dependency graph for your project
graphframes/graphframes
druid-io/tranquility
Tranquility helps you send real-time event streams to Druid and handles partitioning, replication, service discovery, and schema rollover, seamlessly and without downtime.
rxin/jvm-readings
JVM readings
japila-books/spark-sql-internals
The Internals of Spark SQL
japila-books/spark-structured-streaming-internals
The Internals of Spark Structured Streaming
ekampf/PySpark-Boilerplate
A boilerplate for writing PySpark Jobs
awesome-spark/spark-gotchas
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
swartzrock/LearningScalaMaterials
Supplementary materials for the "Learning Scala" book from O'Reilly Media
neoremind/kraps-rpc
A RPC framework leveraging Spark RPC module
aliyun/aliyun-emapreduce-datasources
Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.
allwefantasy/spark-binlog
A library for querying Binlog with Apache Spark structure streaming, for Spark SQL , DataFrames and [MLSQL](https://www.mlsql.tech).
jaceklaskowski/kafka-notebook
The Internals of Apache Kafka
aliyun/aliyun-emapreduce-demo
apache/directory-kerby
Mirror of Apache Directory Kerby
jaceklaskowski/mastering-kafka-streams-book
No longer maintained and soon to be deleted
rxin/jvm-unsafe-utils
Fast JVM collection
linzebing/MiniSpark
Java implementation of a mini Spark-like framework named MiniSpark that can run on top of a HDFS cluster. MiniSpark supports operators including Map, FlatMap, MapPair, Reduce, ReduceByKey, Collect, Count, Parallelize, Join and Filter.
harishreedharan/usingflumecode
Companion Code for Using Flume Book
direct-spark-sql/direct-spark-sql
a hyper-optimized single-node(local) version of spark sql engine, which's fundamental data structure is scala Iterator rather than RDD.