Sangrho's Stars
apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
ray-project/ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
apache/flink
Apache Flink
EthicalML/awesome-production-machine-learning
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
zhisheng17/flink-learning
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
apache/dolphinscheduler
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
yahoo/CMAK
CMAK is a tool for managing Apache Kafka clusters
trinodb/trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
redpanda-data/redpanda
Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
vaexio/vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
vespa-engine/vespa
AI + Data, online. https://vespa.ai
facebookresearch/Kats
Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.
nikitavoloboev/wiki
Everything I know
amundsen-io/amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
apache/incubator-heron
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
ploomber/ploomber
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
tchiotludo/akhq
Kafka GUI for Apache Kafka to manage topics, topics data, consumers group, schema registry, connect and more...
apache/kyuubi
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
MarquezProject/marquez
Collect, aggregate, and visualize a data ecosystem's metadata
Netflix/metacat
apache/aurora
Apache Aurora - A Mesos framework for long-running services, cron jobs, and ad-hoc jobs
qubole/sparklens
Qubole Sparklens tool for performance tuning Apache Spark
astronomer/astronomer
Helm Charts for the Astronomer Platform, Apache Airflow as a Service on Kubernetes
apache/apex-core
Mirror of Apache Apex core
uber/RemoteShuffleService
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
apache/hbase-connectors
Apache HBase Connectors
awslabs/emr-dynamodb-connector
Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB
kafka-lens/kafka-lens
A tool for monitoring Kafka clusters, topics, partitions, and message flow.
kevink1103/pyprnt
A Modern Python Pretty Printer