Pinned Repositories
aliyun-emapreduce-sdk
Hadoop/Spark on Aliyun, supporting interactions with Aliyun's base services.
Alluxio
Memory-centric Storage System for Big Data Analytics
argo
ArgoProj: Get stuff done with Kubernetes.
arrow-datafusion-comet
Apache Arrow DataFusion Comet Spark Accelerator
CoolplaySpark
酷玩 Spark: Spark 源代码解析、Spark 类库等
DataflowJavaSDK
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
flink
Mirror of Apache Flink
HiBench
HiBench is a Hadoop benchmark suite.
shark
Hive on Spark
spark
Mirror of Apache Spark
shimingfei's Repositories
shimingfei/argo
ArgoProj: Get stuff done with Kubernetes.
shimingfei/arrow-datafusion-comet
Apache Arrow DataFusion Comet Spark Accelerator
shimingfei/cling
The interactive C++ interpreter Cling
shimingfei/spark
Mirror of Apache Spark
shimingfei/blaze
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
shimingfei/data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
shimingfei/dolphinscheduler
Apache DolphinScheduler is the modern data workflow orchestration platform with powerful user interface, dedicated to solving complex task dependencies in the data pipeline and providing various types of jobs available `out of the box`
shimingfei/foundationdb
FoundationDB - the open source, distributed, transactional key-value store
shimingfei/godel-scheduler
an unified scheduler for online and offline tasks
shimingfei/hudi
Upserts, Deletes And Incremental Processing on Big Data.
shimingfei/incubator-gluten
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
shimingfei/incubator-iceberg
Apache Iceberg (Incubating)
shimingfei/javacv
Java interface to OpenCV, FFmpeg, and more
shimingfei/JavaGuide
【Java学习+面试指南】 一份涵盖大部分Java程序员所需要掌握的核心知识。
shimingfei/katalyst-core
Katalyst aims to provide a universal solution to help improve resource utilization and optimize the overall costs in the cloud. This is the core components in Katalyst system, including multiple agents and centralized components
shimingfei/koordinator
A QoS-based scheduling system brings optimal layout and status to workloads such as microservices, web services, big data jobs, AI jobs, etc.
shimingfei/kubeadmiral
Multi-Cluster Kubernetes Orchestration
shimingfei/kudu-rpm
RPM packages for Apache Kudu on CentOS 7
shimingfei/lede
Lean's OpenWrt source
shimingfei/llama_index
LlamaIndex is a data framework for your LLM applications
shimingfei/LLMsPracticalGuide
A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)
shimingfei/mlflow
Open source platform for the complete machine learning lifecycle
shimingfei/shadowsocks-libev
libev port of shadowsocks
shimingfei/spark-jobserver
REST job server for Apache Spark
shimingfei/substrait
A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
shimingfei/velox
A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
shimingfei/volcano
A Cloud Native Batch System (Project under CNCF)
shimingfei/wasmtime
A fast and secure runtime for WebAssembly
shimingfei/weld-java
JVM integration for Weld
shimingfei/yunikorn-core
Apache YuniKorn Core