big-data
There are 4036 repositories under big-data topic.
iotdb
Apache IoTDB
vue-virtual-scroll-list
⚡️A vue component support big amount data list with high render performance and efficient.
crate
CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.
fastjson2
🚄 FASTJSON2 is a Java JSON library with excellent performance.
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
koalas
Koalas: pandas API on Apache Spark
GraphScope
🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统
CBoard
An easy to use, self-service open BI reporting and BI dashboard platform.
Data-Science-Roadmap
Data Science Roadmap from A to Z
incubator-hugegraph
A graph database that supports more than 100+ billion data, high performance and scalability (Include OLTP Engine & REST-API & Backends)
featurebase
A crazy fast analytical database, built on bitmaps. Perfect for ML applications. Learn more at: http://docs.featurebase.com/. Start a Docker instance: https://hub.docker.com/r/featurebasedb/featurebase
parquet-java
Apache Parquet
NakedTensor
Bare bone examples of machine learning in TensorFlow
alldata
AllData数据中台开源项目,以数据平台为底座,以数据中台为桥梁,以机器学习平台为中层框架,以大模型应用为上游产品,提供全链路数字化解决方案。加入技术社区:https://docs.qq.com/doc/DVHlkSEtvVXVCdEFo
LakeSoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
ambari
Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.
paimon
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
quary
Open-source BI for engineers
poseidon
A search engine which can hold 100 trillion lines of log data.
drill
Apache Drill is a distributed MPP query layer for self describing data
bookkeeper
Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads
kudu
Mirror of Apache Kudu
Daft
Distributed DataFrame for Python designed for the cloud, powered by Rust
ytsaurus
YTsaurus is a scalable and fault-tolerant open-source big data platform.
Gaffer
A large-scale entity and relation database supporting aggregation of properties
genie
Distributed Big Data Orchestration Service
parquet-format
Apache Parquet
spark-py-notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
moosefs
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
bitsail
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
just-dashboard
:bar_chart: :clipboard: Dashboards using YAML or JSON files
fluid
Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)
mysql_perf_analyzer
MySQL performance monitoring and analysis.
carbondata
High performance data store solution
matano
Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
datafusion-ballista
Apache Arrow Ballista Distributed Query Engine