big-data

There are 4036 repositories under big-data topic.

  • iotdb

    Apache IoTDB

    Language:Java4.3k
  • vue-virtual-scroll-list

    ⚡️A vue component support big amount data list with high render performance and efficient.

    Language:JavaScript4.3k
  • crate

    CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.

    Language:Java4k
  • fastjson2

    🚄 FASTJSON2 is a Java JSON library with excellent performance.

    Language:Java3.5k
  • img2dataset

    Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

    Language:Python3.4k
  • koalas

    Koalas: pandas API on Apache Spark

    Language:Python3.3k
  • GraphScope

    GraphScope

    🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统

    Language:C++3.1k
  • CBoard

    An easy to use, self-service open BI reporting and BI dashboard platform.

    Language:JavaScript3k
  • Data-Science-Roadmap

    Data Science Roadmap from A to Z

  • incubator-hugegraph

    incubator-hugegraph

    A graph database that supports more than 100+ billion data, high performance and scalability (Include OLTP Engine & REST-API & Backends)

    Language:Java2.6k
  • featurebase

    A crazy fast analytical database, built on bitmaps. Perfect for ML applications. Learn more at: http://docs.featurebase.com/. Start a Docker instance: https://hub.docker.com/r/featurebasedb/featurebase

    Language:Go2.5k
  • parquet-java

    Apache Parquet

    Language:Java2.5k
  • NakedTensor

    Bare bone examples of machine learning in TensorFlow

    Language:Python2.4k
  • alldata

    alldata

    AllData数据中台开源项目,以数据平台为底座,以数据中台为桥梁,以机器学习平台为中层框架,以大模型应用为上游产品,提供全链路数字化解决方案。加入技术社区:https://docs.qq.com/doc/DVHlkSEtvVXVCdEFo

    Language:Java2.3k
  • LakeSoul

    LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

    Language:Java2.3k
  • ambari

    Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.

    Language:Java2.1k
  • paimon

    Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.

    Language:Java2k
  • quary

    quary

    Open-source BI for engineers

    Language:Rust2k
  • poseidon

    A search engine which can hold 100 trillion lines of log data.

    Language:Go2k
  • drill

    Apache Drill is a distributed MPP query layer for self describing data

    Language:Java1.9k
  • bookkeeper

    Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads

    Language:Java1.9k
  • kudu

    Mirror of Apache Kudu

    Language:C++1.8k
  • Daft

    Daft

    Distributed DataFrame for Python designed for the cloud, powered by Rust

    Language:Rust1.8k
  • ytsaurus

    YTsaurus is a scalable and fault-tolerant open-source big data platform.

    Language:C++1.8k
  • Gaffer

    A large-scale entity and relation database supporting aggregation of properties

    Language:Java1.7k
  • genie

    Distributed Big Data Orchestration Service

    Language:Java1.7k
  • parquet-format

    Apache Parquet

    Language:Thrift1.7k
  • spark-py-notebooks

    Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

    Language:Jupyter Notebook1.6k
  • moosefs

    MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)

    Language:C1.6k
  • bitsail

    BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.

    Language:Java1.6k
  • just-dashboard

    :bar_chart: :clipboard: Dashboards using YAML or JSON files

    Language:JavaScript1.6k
  • fluid

    Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)

    Language:Go1.6k
  • mysql_perf_analyzer

    MySQL performance monitoring and analysis.

    Language:Java1.4k
  • carbondata

    High performance data store solution

    Language:Scala1.4k
  • matano

    matano

    Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS

    Language:Rust1.4k
  • datafusion-ballista

    Apache Arrow Ballista Distributed Query Engine

    Language:Rust1.4k