ryantotti's Stars
donnemartin/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
laurent22/joplin
Joplin - the privacy-focused note taking app with sync capabilities for Windows, macOS, Linux, Android and iOS.
WangRongsheng/awesome-LLM-resourses
🧑🚀 全世界最好的LLM资料总结 | Summary of the world's best LLM resources.
DataLinkDC/dinky
Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
apache/amoro
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
huaweicloud/obsa-hdfs
risingwavelabs/risingwave
Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
cubefs/compass
Compass is a task diagnosis platform for bigdata
togethercomputer/RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
apache/paimon
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
lakesoul-io/LakeSoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
apache/celeborn
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
flink-extended/flink-remote-shuffle
Remote Shuffle Service for Flink
delta-io/delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Intel-bigdata/SSM
Smart Storage Management for Big Data, a comprehensive hot/cold data optimized solution
dremio/dremio-oss
Dremio - the missing link in modern data
StarRocks/starrocks
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
fluid-cloudnative/fluid
Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)
dCache/nfs4j
Pure Java NFSv3 and NFSv4.2 implementation
databendlabs/databend
𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
tencentyun/hadoop-cos
hadoop-cos(CosN文件系统)为Apache Hadoop、Spark以及Tez等大数据计算框架集成提供支持,可以像访问HDFS一样读写存储在腾讯云COS上的数据。同时也支持作为Druid等查询与分析引擎的Deep Storage
kahing/goofys
a high-performance, POSIX-ish Amazon S3 file system written in Go
seaweedfs/seaweedfs
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
eucalyptus/eucalyptus
Eucalyptus Cloud-computing Platform
smarty-prototypes/go-disruptor
A port of the LMAX Disruptor to the Go language.
yireyun/go-queue
High-performance lock-free queue (Disruptor 1400/s)
Tencent/Tendis
Tendis is a high-performance distributed storage system fully compatible with the Redis protocol.
RedisLabs/redisraft
A Redis Module that make it possible to create a consistent Raft cluster from multiple Redis instances.
Hexilee/tifs
A distributed POSIX filesystem based on TiKV, with partition tolerance and strict consistency.
distributedio/titan
A Distributed Redis Protocol Compatible NoSQL Database