bigdata

There are 2306 repositories under bigdata topic.

  • DataExpert-io/data-engineer-handbook

    This is a repo with links to everything you'd ever want to learn about data engineering

    Language:Jupyter Notebook38.6k455837.4k
  • TDengine

    taosdata/TDengine

    High-performance, scalable time-series database designed for Industrial IoT (IIoT) scenarios

    Language:C24.5k6825.4k5k
  • apache/shardingsphere

    Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.

    Language:Java20.5k97411.7k6.9k
  • heibaiying/BigData-Notes

    大数据入门指南 :star:

    Language:Java16.7k448424.3k
  • oxnr/awesome-bigdata

    A curated list of awesome big data frameworks, ressources and other awesomeness.

  • juicefs

    juicedata/juicefs

    JuiceFS is a distributed POSIX file system built on top of Redis and S3.

    Language:Go12.4k1131.7k1.1k
  • rustfs/rustfs

    🚀2.3x Faster than MinIO for 4K Small Files. RustFS is an open-source, S3-compatible high-performance object storage system supporting migration and coexistence with other S3-compatible platforms such as MinIO and Ceph.

    Language:Rust11.2k40366536
  • wangzhiwubigdata/God-Of-BigData

    专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

  • databend

    databendlabs/databend

    𝗔𝗜-𝗡𝗮𝘁𝗶𝘃𝗲 𝗗𝗮𝘁𝗮 𝗪𝗮𝗿𝗲𝗵𝗼𝘂𝘀𝗲. Blazing analytics, fast search, geo insights, vector AI. Built for multimodal analytics, Open-source Snowflake alternative. https://databend.com

    Language:Rust9k915.9k835
  • vaexio/vaex

    Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

    Language:Python8.4k1361.2k602
  • apache/hudi

    Upserts, Deletes And Incremental Processing on Big Data.

    Language:Java6k1.1k3.5k2.4k
  • volcano-sh/volcano

    A Cloud Native Batch System (Project under CNCF)

    Language:Go5.1k841.9k1.2k
  • iGaoWei/BigDataView

    100+套大数据可视化炫酷大屏Html5模板;包含行业:社区、物业、政务、交通、金融银行等,全网最新、最多,最全、最酷、最炫大数据可视化模板。陆续更新中

    Language:JavaScript4.7k4751.3k
  • DTStack/chunjun

    A data integration framework

    Language:Java4.1k1631.2k1.7k
  • liyupi/sql-generator

    🔨 用 JSON 来生成结构化的 SQL 语句,基于 Vue3 + TypeScript + Vite + Ant Design + MonacoEditor 实现,项目简单(重逻辑轻页面)、适合练手~

    Language:Vue3.5k1921708
  • apache/avro

    Apache Avro is a data serialization system.

    Language:Java3.2k10201.7k
  • MoRan1607/BigDataGuide

    大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料

  • douban/dpark

    Python clone of Spark, a MapReduce alike framework in Python

    Language:Python2.7k26361530
  • griddb/griddb

    GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.

    Language:C++2.5k3582945k
  • dotnet/spark

    .NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

    Language:C#2.1k80584330
  • DTStack/flinkStreamSQL

    基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法

    Language:Java2.1k114323923
  • shzlw/poli

    An easy-to-use BI server built for SQL lovers. Power data analysis in SQL and gain faster business insights.

    Language:Java2k6371336
  • byzer-org/byzer-lang

    Byzer (former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.

    Language:Scala1.9k112587547
  • Netflix/genie

    Distributed Big Data Orchestration Service

    Language:Java1.8k512189373
  • collabH/bigdata-growth

    大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

    Language:Shell1.7k344386
  • YoongiKim/AutoCrawler

    Google, Naver multiprocess image web crawler (Selenium)

    Language:Python1.7k4446430
  • jadianes/spark-py-notebooks

    Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

    Language:Jupyter Notebook1.7k9710916
  • water8394/BigData-Interview

    :dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

  • apconw/sanic-web

    一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen3等大模型 基于 Dify 、LangChain/LangGraph、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目,采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答,具备处理 CSV 文件 📂 表格问答的能力。同时,能方便对接第三方开源 RAG 系统 检索系统 🌐等,以支持广泛的通用知识问答。

    Language:JavaScript1.6k1951276
  • hi-primus/optimus

    :truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

    Language:Python1.5k35219233
  • tensorbase/tensorbase

    TensorBase is a new big data warehousing with modern efforts.

    Language:Rust1.5k35107121
  • odd-platform

    opendatadiscovery/odd-platform

    First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

    Language:Java1.4k18645130
  • ganweisoft/TOMs

    TOMs is a fully open-source, high-performance, systematic, plugin-oriented, and scenario-agnostic general-purpose development framework.

    Language:Batchfile1.2k820147
  • kubernetes-retired/kube-batch

    A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC

    Language:Go1.1k49279262
  • josonle/Coding-Now

    学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等

    Language:Python1k458310
  • apache/celeborn

    Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.

    Language:Java1k37509406