wenxuanguan

wenxuanguan's Stars

EbookFoundation/free-programming-books
:books: Freely available programming books
Language:HTML341k 9.7k 1.1k61.9k
cncf/landscape
🌄 The Cloud Native Interactive Landscape filters and sorts hundreds of projects and products, and shows details including GitHub stars, funding, first and last commits, contributor counts and headquarters location.
9.4k 415 7002k
jigish/slate
A window management application (replacement for Divvy/SizeUp/ShiftIt)
Language:Objective-C7.8k 158 470512
JerryLead/SparkInternals
Notes talking about the design and implementation of Apache Spark
5.3k 618 331.8k
facebookincubator/velox
A composable and fully extensible C++ execution engine library for data management systems.
Language:C++3.5k 112 2.2k1.2k
tomwhite/hadoop-book
Example source code accompanying O'Reilly's "Hadoop: The Definitive Guide" by Tom White
Language:Makefile3.5k 449 312.6k
databricks/scala-style-guide
Databricks Scala Coding Style Guide
2.7k 140 18579
apache/logging-flume
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-like data
Language:Java2.5k 225 11.6k
awesome-spark/awesome-spark
A curated list of awesome Apache Spark packages and resources.
Language:Shell1.7k 85 73332
linkedin/dr-elephant
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Language:Java1.4k 128 316859
sbt/sbt-dependency-graph
sbt plugin to create a dependency graph for your project
Language:Scala1.2k 54 131113
graphframes/graphframes
Language:Scala1k 59 264238
druid-io/tranquility
Tranquility helps you send real-time event streams to Druid and handles partitioning, replication, service discovery, and schema rollover, seamlessly and without downtime.
Language:Scala516 59 159230
rxin/jvm-readings
JVM readings
480 57 0109
japila-books/spark-sql-internals
The Internals of Spark SQL
457 17 5130
japila-books/spark-structured-streaming-internals
The Internals of Spark Structured Streaming
416 40 7171
ekampf/PySpark-Boilerplate
A boilerplate for writing PySpark Jobs
Language:Python393 19 2156
awesome-spark/spark-gotchas
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
359 33 1380
swartzrock/LearningScalaMaterials
Supplementary materials for the "Learning Scala" book from O'Reilly Media
237 17 4130
neoremind/kraps-rpc
A RPC framework leveraging Spark RPC module
Language:Scala211 14 4106
aliyun/aliyun-emapreduce-datasources
Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.
Language:Scala168 38 10888
allwefantasy/spark-binlog
A library for querying Binlog with Apache Spark structure streaming, for Spark SQL , DataFrames and [MLSQL](https://www.mlsql.tech).
Language:Scala154 9 2454
jaceklaskowski/kafka-notebook
The Internals of Apache Kafka
132 11 049
aliyun/aliyun-emapreduce-demo
Language:Java121 20 2054
apache/directory-kerby
Mirror of Apache Directory Kerby
Language:Java110 25 072
jaceklaskowski/mastering-kafka-streams-book
No longer maintained and soon to be deleted
76 9 036
rxin/jvm-unsafe-utils
Fast JVM collection
Language:Java58 16 016
linzebing/MiniSpark
Java implementation of a mini Spark-like framework named MiniSpark that can run on top of a HDFS cluster. MiniSpark supports operators including Map, FlatMap, MapPair, Reduce, ReduceByKey, Collect, Count, Parallelize, Join and Filter.
Language:Java33 3 113
harishreedharan/usingflumecode
Companion Code for Using Flume Book
Language:Java32 8 146
direct-spark-sql/direct-spark-sql
a hyper-optimized single-node(local) version of spark sql engine, which's fundamental data structure is scala Iterator rather than RDD.
Language:Scala12 3 29