spark
There are 8334 repositories under spark topic.
spark-jobserver
REST job server for Apache Spark
cube-studio
cube studio开源云原生一站式机器学习/深度学习/大模型AI平台,支持sso登录,多租户,大数据平台对接,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU,边缘计算,serverless,标注平台,自动化标注,数据集管理,大模型微调,vllm大模型推理,llmops,私有知识库,AI模型应用商店,支持模型一键开发/推理/微调,支持国产cpu/gpu/npu芯片,支持RDMA,支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/spark/ray/volcano分布式
dpark
Python clone of Spark, a MapReduce alike framework in Python
spark-operator
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
BigDataGuide
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
spring-boot-quick
:herb: 基于springboot的快速学习示例,整合自己遇到的开源框架,如:rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、k3s、k3d、k8s、mybatis加解密插件、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等:pushpin:
LakeSoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
TransmogrifAI
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
SZT-bigdata
深圳地铁大数据客流分析系统🚇🚄🌟
zio-quill
Compile-time Language Integrated Queries for Scala
Quicksql
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
paimon
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
kyuubi
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
spark-ml-source-analysis
spark ml 算法原理剖析以及具体的源码实现分析
spark-cassandra-connector
DataStax Connector for Apache Spark to Apache Cassandra
fugue
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
benchm-ml
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
ytsaurus
YTsaurus is a scalable and fault-tolerant open-source big data platform.
Gaffer
A large-scale entity and relation database supporting aggregation of properties
.github
ApacheCN 开源组织:公告、介绍、成员、活动、交流方式
elassandra
Elassandra = Elasticsearch + Apache Cassandra
gatk
Official code repository for GATK versions 4 and up
spark-py-notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Spark
✨Spark is a web-based, cross-platform and full-featured Remote Administration Tool (RAT) written in Go that allows you control all your devices anywhere. Spark是一个Go编写的,网页UI、跨平台以及多功能的远程控制和监控工具,你可以随时随地监控和控制所有设备。
almond
A Scala kernel for Jupyter
Tutorial
后端 (Java Golang)全栈知识架构体系总结
elephas
Distributed Deep learning with Keras & Spark
BigData-Interview
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
pyspark-example-project
Implementing best practices for PySpark ETL jobs and applications.
mleap
MLeap: Deploy ML Pipelines to Production
seldon-server
Machine Learning Platform and Recommendation Engine built on Kubernetes
optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
apache-spark-internals
The Internals of Apache Spark
carbondata
High performance data store solution
dji-firmware-tools
Tools for handling firmwares of DJI products, with focus on quadcopters.