chenlj's Stars
OpenLineage/OpenLineage
An Open Standard for lineage metadata collection
apache/amoro
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
JerryLead/SparkInternals
Notes talking about the design and implementation of Apache Spark
facebook/rocksdb
A library that provides an embeddable, persistent key-value store for fast storage.
apache/paimon
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
MaterializeInc/materialize
The Cloud Operational Data Store: use SQL to transform, deliver, and act on fast-changing data.
risingwavelabs/risingwave
Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
apache/dolphinscheduler
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
apache/doris
Apache Doris is an easy-to-use, high performance and unified analytics database.
hashicorp/raft
Golang implementation of the Raft consensus protocol
trinodb/trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
apache/flink-cdc
Flink CDC is a streaming data integration tool
apache/spark
Apache Spark - A unified analytics engine for large-scale data processing
alibaba/DataX
DataX是阿里云DataWorks数据集成的开源版本。
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
google-research/bert
TensorFlow code and pre-trained models for BERT
panjf2000/gnet
🚀 gnet is a high-performance, lightweight, non-blocking, event-driven networking framework written in pure Go.
apache/skywalking
APM, Application Performance Monitoring System
ClickHouse/ClickHouse
ClickHouse® is a real-time analytics DBMS
prestodb/presto
The official home of the Presto distributed SQL query engine for big data
apache/kudu
Mirror of Apache Kudu
heibaiying/BigData-Notes
大数据入门指南 :star:
apache/pulsar
Apache Pulsar - distributed pub-sub messaging system
alibaba/canal
阿里巴巴 MySQL binlog 增量订阅&消费组件
lni/dragonboat
A feature complete and high performance multi-group Raft library in Go.
prometheus/prometheus
The Prometheus monitoring system and time series database.
microsoft/RulesEngine
A Json based Rules Engine with extensive Dynamic expression support
dapr/dapr
Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.
HackerNews/API
Documentation and Samples for the Official HN API
torvalds/linux
Linux kernel source tree