big-data

There are 4036 repositories under big-data topic.

iotdb
Apache IoTDB
Language:Java4.3k
vue-virtual-scroll-list
⚡️A vue component support big amount data list with high render performance and efficient.
Language:JavaScript4.3k
crate
CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.
Language:Java4k
fastjson2
🚄 FASTJSON2 is a Java JSON library with excellent performance.
Language:Java3.5k
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Language:Python3.4k
koalas
Koalas: pandas API on Apache Spark
Language:Python3.3k
GraphScope
🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统
Language:C++3.1k
CBoard
An easy to use, self-service open BI reporting and BI dashboard platform.
Language:JavaScript3k
Data-Science-Roadmap
Data Science Roadmap from A to Z
3k
incubator-hugegraph
A graph database that supports more than 100+ billion data, high performance and scalability (Include OLTP Engine & REST-API & Backends)
Language:Java2.6k
featurebase
A crazy fast analytical database, built on bitmaps. Perfect for ML applications. Learn more at: http://docs.featurebase.com/. Start a Docker instance: https://hub.docker.com/r/featurebasedb/featurebase
Language:Go2.5k
parquet-java
Apache Parquet
Language:Java2.5k
NakedTensor
Bare bone examples of machine learning in TensorFlow
Language:Python2.4k
alldata
AllData数据中台开源项目，以数据平台为底座，以数据中台为桥梁，以机器学习平台为中层框架，以大模型应用为上游产品，提供全链路数字化解决方案。加入技术社区：https://docs.qq.com/doc/DVHlkSEtvVXVCdEFo
Language:Java2.3k
LakeSoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Language:Java2.3k
ambari
Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.
Language:Java2.1k
paimon
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
Language:Java2k
quary
Open-source BI for engineers
Language:Rust2k
poseidon
A search engine which can hold 100 trillion lines of log data.
Language:Go2k
drill
Apache Drill is a distributed MPP query layer for self describing data
Language:Java1.9k
bookkeeper
Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads
Language:Java1.9k
kudu
Mirror of Apache Kudu
Language:C++1.8k
Daft
Distributed DataFrame for Python designed for the cloud, powered by Rust
Language:Rust1.8k
ytsaurus
YTsaurus is a scalable and fault-tolerant open-source big data platform.
Language:C++1.8k
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Language:Java1.7k
genie
Distributed Big Data Orchestration Service
Language:Java1.7k
parquet-format
Apache Parquet
Language:Thrift1.7k
spark-py-notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Language:Jupyter Notebook1.6k
moosefs
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Language:C1.6k
bitsail
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
Language:Java1.6k
just-dashboard
:bar_chart: :clipboard: Dashboards using YAML or JSON files
Language:JavaScript1.6k
fluid
Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)
Language:Go1.6k
mysql_perf_analyzer
MySQL performance monitoring and analysis.
Language:Java1.4k
carbondata
High performance data store solution
Language:Scala1.4k
matano
Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
Language:Rust1.4k
datafusion-ballista
Apache Arrow Ballista Distributed Query Engine
Language:Rust1.4k