Sangrho

Data engineer

Dunamu

Sangrho's Stars

apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Language:Python36.4k 756 9.6k14.1k
ray-project/ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Language:Python33.1k 474 18.4k5.6k
apache/flink
Apache Flink
Language:Java23.8k 948 013.3k
EthicalML/awesome-production-machine-learning
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
17.3k 402 762.2k
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Language:Python15.8k 164 5.5k1.6k
zhisheng17/flink-learning
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例，还有 Flink 落地应用的大型项目案例（PVUV、日志存储、百亿数据实时去重、监控告警）分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Language:Java14.5k 515 03.9k
apache/dolphinscheduler
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Language:Java12.7k 329 7.6k4.6k
yahoo/CMAK
CMAK is a tool for managing Apache Kafka clusters
Language:Scala11.8k 532 6872.5k
trinodb/trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Language:Java10.2k 174 6.6k2.9k
redpanda-data/redpanda
Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
Language:C++9.4k 138 11.3k579
vaexio/vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
Language:Python8.3k 143 1.2k590
vespa-engine/vespa
AI + Data, online. https://vespa.ai
Language:Java5.6k 160 983589
facebookresearch/Kats
Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.
Language:Python4.9k 79 185535
nikitavoloboev/wiki
Everything I know
4.8k 120 45567
amundsen-io/amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
Language:Python4.4k 234 685955
apache/incubator-heron
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
Language:Java3.6k 283 1.1k597
ploomber/ploomber
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
Language:Python3.5k 30 865237
tchiotludo/akhq
Kafka GUI for Apache Kafka to manage topics, topics data, consumers group, schema registry, connect and more...
Language:Java3.4k 61 1k652
apache/kyuubi
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Language:Scala2.1k 64 2.3k903
MarquezProject/marquez
Collect, aggregate, and visualize a data ecosystem's metadata
Language:Java1.7k 48 786310
Netflix/metacat
Language:Java1.6k 400 58279
apache/aurora
Apache Aurora - A Mesos framework for long-running services, cron jobs, and ad-hoc jobs
Language:Java635 156 36232
qubole/sparklens
Qubole Sparklens tool for performance tuning Apache Spark
Language:Scala562 30 56138
astronomer/astronomer
Helm Charts for the Astronomer Platform, Apache Airflow as a Service on Kubernetes
Language:Python463 46 27986
apache/apex-core
Mirror of Apache Apex core
Language:Java350 50 0176
uber/RemoteShuffleService
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
Language:Java321 19 42100
apache/hbase-connectors
Apache HBase Connectors
Language:Scala235 51 0176
awslabs/emr-dynamodb-connector
Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB
Language:Java216 52 100135
kafka-lens/kafka-lens
A tool for monitoring Kafka clusters, topics, partitions, and message flow.
Language:JavaScript192 5 828
kevink1103/pyprnt
A Modern Python Pretty Printer
Language:Python73 5 19

Sangrho

Sangrho's Stars

apache/airflow

ray-project/ray

apache/flink

EthicalML/awesome-production-machine-learning

PrefectHQ/prefect

zhisheng17/flink-learning

apache/dolphinscheduler

yahoo/CMAK

trinodb/trino

redpanda-data/redpanda

vaexio/vaex

vespa-engine/vespa

facebookresearch/Kats

nikitavoloboev/wiki

amundsen-io/amundsen

apache/incubator-heron

ploomber/ploomber

tchiotludo/akhq

apache/kyuubi

MarquezProject/marquez

Netflix/metacat

apache/aurora

qubole/sparklens

astronomer/astronomer

apache/apex-core

uber/RemoteShuffleService

apache/hbase-connectors

awslabs/emr-dynamodb-connector

kafka-lens/kafka-lens

kevink1103/pyprnt