big-data

There are 4837 repositories under big-data topic.

binhnguyennus/awesome-scalability
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
65.4k 1.9k 06.6k
ClickHouse/ClickHouse
ClickHouse® is a real-time analytics database management system
Language:C++42.9k 687 25.3k7.7k
apache/spark
Apache Spark - A unified analytics engine for large-scale data processing
Language:Scala41.9k 2k 028.8k
donnemartin/data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Language:Python28.5k 1.6k 418k
apache/flink
Apache Flink
Language:Java25.3k 930 013.8k
amark/gun
An open source cybersecurity protocol for syncing decentralized graph data.
Language:JavaScript18.6k 316 8081.2k
heibaiying/BigData-Notes
大数据入门指南 :star:
Language:Java16.6k 448 424.3k
prestodb/presto
The official home of the Presto distributed SQL query engine for big data
Language:Java16.5k 843 7k5.5k
andkret/Cookbook
The Data Engineering Cookbook
Language:Python14.5k 546 1362.6k
apache/predictionio
PredictionIO, a machine learning server for developers and ML engineers.
Language:Scala12.5k 753 01.9k
yahoo/CMAK
CMAK is a tool for managing Apache Kafka clusters
Language:Scala11.9k 525 6882.5k
trinodb/trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Language:Java11.9k 179 7.4k3.3k
vesoft-inc/nebula
A distributed, fast open-source graph database featuring horizontal scalability and high availability
Language:C++11.7k 190 2.6k1.3k
provectus/kafka-ui
Open-Source Web UI for Apache Kafka Management
Language:Java11.3k 74 1.8k1.3k
StarRocks/starrocks
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
Language:Java10.7k 187 8.9k2.1k
quickwit-oss/quickwit
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
Language:Rust10.4k 73 2.4k485
cython/cython
The most widely used Python to C compiler
Language:Python10.3k 236 4k1.6k
catboost/catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Language:C++8.6k 190 2.5k1.2k
apache/beam
Apache Beam is a unified programming model for Batch and Streaming data processing.
Language:Java8.3k 257 7.9k4.4k
delta-io/delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Language:Scala8.3k 216 1.7k1.9k
apache/datafusion
Apache DataFusion SQL Query Engine
Language:Rust7.7k 112 6.8k1.6k
h2oai/h2o-3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Language:Jupyter Notebook7.3k 380 9.6k2k
arkime/arkime
Arkime is an open source, large scale, full packet capturing, indexing, and database system.
Language:JavaScript6.9k 346 1.5k1.1k
apache/couchdb
Seamless multi-primary syncing database with an intuitive HTTP/JSON API, designed for reliability
Language:Erlang6.7k 230 1.7k1.1k
apache/zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Language:Java6.6k 302 02.8k
hazelcast/hazelcast
Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.
Language:Java6.4k 293 8.7k1.9k
vespa-engine/vespa
AI + Data, online. https://vespa.ai
Language:Java6.3k 155 1.1k659
feast-dev/feast
The Open Source Feature Store for AI/ML
Language:Python6.3k 74 1.7k1.1k
pachyderm/pachyderm
Data-Centric Pipelines and Data Versioning
Language:Go6.3k 157 3.1k566
apache/iotdb
Apache IoTDB
Language:Java5.9k 119 9251.1k
apache/hive
Apache Hive
Language:Java5.8k 318 04.8k
microsoft/SynapseML
Simple and Distributed Machine Learning
Language:Scala5.2k 141 758850
apache/ignite
Apache Ignite
Language:Java5k 270 2141.9k
apache/calcite
Apache Calcite
Language:Java4.9k 164 02.4k
tschellenbach/Stream-Framework
Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
Language:Python4.7k 209 184536
tangbc/vue-virtual-scroll-list
⚡️A vue component support big amount data list with high render performance and efficient.
Language:JavaScript4.5k 64 356605

big-data

binhnguyennus/awesome-scalability

ClickHouse/ClickHouse

apache/spark

donnemartin/data-science-ipython-notebooks

apache/flink

amark/gun

heibaiying/BigData-Notes

prestodb/presto

andkret/Cookbook

apache/predictionio

yahoo/CMAK

trinodb/trino

vesoft-inc/nebula

provectus/kafka-ui

StarRocks/starrocks

quickwit-oss/quickwit

cython/cython

catboost/catboost

apache/beam

delta-io/delta

apache/datafusion

h2oai/h2o-3

arkime/arkime

apache/couchdb

apache/zeppelin

hazelcast/hazelcast

vespa-engine/vespa

feast-dev/feast

pachyderm/pachyderm

apache/iotdb

apache/hive

microsoft/SynapseML

apache/ignite

apache/calcite

tschellenbach/Stream-Framework

tangbc/vue-virtual-scroll-list