spark

There are 9262 repositories under spark topic.

apache/spark
Apache Spark - A unified analytics engine for large-scale data processing
Language:Scala42.3k 2k 028.9k
DataTalksClub/data-engineering-zoomcamp
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼
Language:Jupyter Notebook33.4k 538 1367.1k
donnemartin/data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Language:Python28.6k 1.6k 418k
getredash/redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Language:Python27.9k 567 2.6k4.5k
yeasy/docker_practice
Learn and understand Docker&Container technologies, with real DevOps practice!
Language:Go25.7k 836 2155.8k
heibaiying/BigData-Notes
大数据入门指南 :star:
Language:Java16.7k 448 424.3k
FavioVazquez/ds-cheatsheets
List of Data Science Cheatsheets to rule the world
15.8k 549 174k
GaiZhenbiao/ChuanhuChatGPT
GUI for ChatGPT API and many LLMs. Supports agents, file-based QA, GPT finetuning and query with web search. All with a neat UI.
Language:Python15.4k 84 8032.3k
zhisheng17/flink-learning
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例，还有 Flink 落地应用的大型项目案例（PVUV、日志存储、百亿数据实时去重、监控告警）分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Language:Java15k 513 04k
aalansehaiyang/technology-talk
【大厂面试专栏】一份Java程序员需要的技术指南，这里有面试题、系统架构、职场锦囊、主流中间件等，让你成为更牛的自己！
14.7k 830 253.8k
horovod/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Language:Python14.6k 326 2.2k2.3k
apache/doris
Apache Doris is an easy-to-use, high performance and unified analytics database.
Language:Java14.6k 281 8.1k3.6k
deeplearning4j/deeplearning4j
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...
Language:Java14.1k 762 5.8k3.9k
wangzhiwubigdata/God-Of-BigData
专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
10.3k 330 03.2k
tobymao/sqlglot
Python SQL Parser and Transpiler
Language:Python8.5k 47 2.5k1k
mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Language:Python8.5k 63 1k888
delta-io/delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Language:Scala8.4k 214 1.8k1.9k
h2oai/h2o-3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Language:Jupyter Notebook7.4k 380 9.6k2k
Alluxio/alluxio
Alluxio, data orchestration for analytics and machine learning in the cloud
Language:Java7.1k 439 2.2k3k
Angel-ML/angel
A Flexible and Powerful Parameter Server for large-scale machine learning
Language:Java6.8k 440 6291.6k
apache/zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Language:Java6.6k 302 02.8k
donnemartin/dev-setup
macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.
Language:Python6.2k 185 481.2k
microsoft/SynapseML
Simple and Distributed Machine Learning
Language:Scala5.2k 137 764852
tencentmusic/cube-studio
cube studio开源云原生一站式机器学习/深度学习/大模型AI平台，mlops算法链路全流程，算力租赁平台，notebook在线开发，拖拉拽任务流pipeline编排，多机多卡分布式训练，超参搜索，推理服务VGPU虚拟化，边缘计算，标注平台自动化标注，deepseek等大模型sft微调/奖励模型/强化学习训练，vllm/ollama/mindie大模型多机推理，私有知识库，AI模型市场，支持国产cpu/gpu/npu 昇腾生态，支持RDMA，支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/ray/volcano等分布式
Language:Python4.7k 77 157822
JohnSnowLabs/spark-nlp
State of the Art Natural Language Processing
Language:Scala4.1k 98 908733
Cyb3rWard0g/HELK
The Hunting ELK
Language:Jupyter Notebook3.9k 215 453703
yahoo/TensorFlowOnSpark
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Language:Python3.9k 274 369942
RoaringBitmap/RoaringBitmap
A better compressed bitset in Java: used by Apache Spark, Netflix Atlas, Apache Pinot, Tablesaw, and many others
Language:Java3.8k 127 346578
awslabs/deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Language:Scala3.5k 74 348573
lw-lin/CoolplaySpark
酷玩 Spark: Spark 源代码解析、Spark 类库等
Language:Scala3.5k 438 371.4k
liyupi/sql-generator
🔨 用 JSON 来生成结构化的 SQL 语句，基于 Vue3 + TypeScript + Vite + Ant Design + MonacoEditor 实现，项目简单（重逻辑轻页面）、适合练手~
Language:Vue3.5k 20 21707
apache/linkis
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
Language:Java3.4k 262 2.6k1.2k
databricks/koalas
Koalas: pandas API on Apache Spark
Language:Python3.4k 315 590367
WeBankFinTech/DataSphereStudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Language:Java3.2k 179 7501k
spark-notebook/spark-notebook
Interactive and Reactive Data Science using Scala and Spark.
Language:JavaScript3.2k 185 515653
MoRan1607/BigDataGuide
大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料
3.1k 49 5918

spark

apache/spark

DataTalksClub/data-engineering-zoomcamp

donnemartin/data-science-ipython-notebooks

getredash/redash

yeasy/docker_practice

heibaiying/BigData-Notes

FavioVazquez/ds-cheatsheets

GaiZhenbiao/ChuanhuChatGPT

zhisheng17/flink-learning

aalansehaiyang/technology-talk

horovod/horovod

apache/doris

deeplearning4j/deeplearning4j

wangzhiwubigdata/God-Of-BigData

tobymao/sqlglot

mage-ai/mage-ai

delta-io/delta

h2oai/h2o-3

Alluxio/alluxio

Angel-ML/angel

apache/zeppelin

donnemartin/dev-setup

microsoft/SynapseML

tencentmusic/cube-studio

JohnSnowLabs/spark-nlp

Cyb3rWard0g/HELK

yahoo/TensorFlowOnSpark

RoaringBitmap/RoaringBitmap

awslabs/deequ

lw-lin/CoolplaySpark

liyupi/sql-generator

apache/linkis

databricks/koalas

WeBankFinTech/DataSphereStudio

spark-notebook/spark-notebook

MoRan1607/BigDataGuide