big-data
There are 3976 repositories under big-data topic.
binhnguyennus/awesome-scalability
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
apache/spark
Apache Spark - A unified analytics engine for large-scale data processing
ClickHouse/ClickHouse
ClickHouse® is a free analytics DBMS for big data
donnemartin/data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
apache/flink
Apache Flink
amark/gun
An open source cybersecurity protocol for syncing decentralized graph data.
prestodb/presto
The official home of the Presto distributed SQL query engine for big data
heibaiying/BigData-Notes
大数据入门指南 :star:
questdb/questdb
An open source time-series database for fast ingest and SQL queries
andkret/Cookbook
The Data Engineering Cookbook
apache/predictionio
PredictionIO, a machine learning server for developers and ML engineers.
yahoo/CMAK
CMAK is a tool for managing Apache Kafka clusters
vesoft-inc/nebula
A distributed, fast open-source graph database featuring horizontal scalability and high availability
trinodb/trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
cython/cython
The most widely used Python to C compiler
provectus/kafka-ui
Open-Source Web UI for Apache Kafka Management
StarRocks/starrocks
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
catboost/catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
apache/beam
Apache Beam is a unified programming model for Batch and Streaming data processing.
delta-io/delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
h2oai/h2o-3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
risingwavelabs/risingwave
SQL stream processing, analytics, and management. We decouple storage and compute to offer speedy bootstrapping, dynamic scaling, time-travel queries, and efficient joins.
apache/zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
quickwit-oss/quickwit
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
arkime/arkime
Arkime is an open source, large scale, full packet capturing, indexing, and database system.
pachyderm/pachyderm
Data-Centric Pipelines and Data Versioning
apache/couchdb
Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
hazelcast/hazelcast
Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.
vespa-engine/vespa
AI + Data, online. https://vespa.ai
apache/hive
Apache Hive
feast-dev/feast
The Open Source Feature Store for Machine Learning
apache/datafusion
Apache DataFusion SQL Query Engine
microsoft/SynapseML
Simple and Distributed Machine Learning
tschellenbach/Stream-Framework
Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
apache/ignite
Apache Ignite
apache/calcite
Apache Calcite