spark

There are 8334 repositories under spark topic.

spark-jobserver
REST job server for Apache Spark
Language:Scala2.8k
cube-studio
cube studio开源云原生一站式机器学习/深度学习/大模型AI平台，支持sso登录，多租户，大数据平台对接，notebook在线开发，拖拉拽任务流pipeline编排，多机多卡分布式训练，超参搜索，推理服务VGPU，边缘计算，serverless，标注平台，自动化标注，数据集管理，大模型微调，vllm大模型推理，llmops，私有知识库，AI模型应用商店，支持模型一键开发/推理/微调，支持国产cpu/gpu/npu芯片，支持RDMA，支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/spark/ray/volcano分布式
Language:Jupyter Notebook2.7k
dpark
Python clone of Spark, a MapReduce alike framework in Python
Language:Python2.7k
spark-operator
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Language:Go2.7k
BigDataGuide
大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料
Language:Java2.5k
spring-boot-quick
:herb: 基于springboot的快速学习示例,整合自己遇到的开源框架,如：rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、k3s、k3d、k8s、mybatis加解密插件、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等:pushpin:
Language:Java2.4k
LakeSoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Language:Java2.3k
TransmogrifAI
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Language:Scala2.2k
SZT-bigdata
深圳地铁大数据客流分析系统🚇🚄🌟
Language:Scala2.2k
zio-quill
Compile-time Language Integrated Queries for Scala
Language:Scala2.1k
Quicksql
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Language:Java2.1k
paimon
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
Language:Java2k
spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Language:C#2k
kyuubi
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Language:Scala2k
spark-ml-source-analysis
spark ml 算法原理剖析以及具体的源码实现分析
1.9k
spark-cassandra-connector
DataStax Connector for Apache Spark to Apache Cassandra
Language:Scala1.9k
fugue
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
Language:Python1.9k
benchm-ml
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Language:R1.9k
ytsaurus
YTsaurus is a scalable and fault-tolerant open-source big data platform.
Language:C++1.8k
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Language:Java1.7k
.github
ApacheCN 开源组织：公告、介绍、成员、活动、交流方式
Language:CSS1.7k
elassandra
Elassandra = Elasticsearch + Apache Cassandra
Language:Java1.7k
gatk
Official code repository for GATK versions 4 and up
Language:Java1.6k
spark-py-notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Language:Jupyter Notebook1.6k
Spark
✨Spark is a web-based, cross-platform and full-featured Remote Administration Tool (RAT) written in Go that allows you control all your devices anywhere. Spark是一个Go编写的，网页UI、跨平台以及多功能的远程控制和监控工具，你可以随时随地监控和控制所有设备。
Language:Go1.6k
almond
A Scala kernel for Jupyter
Language:Scala1.6k
Tutorial
后端（Java Golang）全栈知识架构体系总结
Language:Shell1.6k
elephas
Distributed Deep learning with Keras & Spark
Language:Python1.6k
BigData-Interview
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
1.6k
pyspark-example-project
Implementing best practices for PySpark ETL jobs and applications.
Language:Python1.5k
mleap
MLeap: Deploy ML Pipelines to Production
Language:Scala1.5k
seldon-server
Machine Learning Platform and Recommendation Engine built on Kubernetes
Language:Java1.5k
optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Language:Python1.4k
apache-spark-internals
The Internals of Apache Spark
1.4k
carbondata
High performance data store solution
Language:Scala1.4k
dji-firmware-tools
Tools for handling firmwares of DJI products, with focus on quadcopters.
Language:C1.4k