mosche
Principal Data Engineer, Apache Beam Committer and Open Source enthusiast - Open to work!
TalendMunich, Germany
mosche's Stars
mingrammer/diagrams
:art: Diagram as Code for prototyping cloud system architectures
pola-rs/polars
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
google/comprehensive-rust
This is the Rust course used by the Android team at Google. It provides you the material to quickly teach Rust.
apache/skywalking
APM, Application Performance Monitoring System
LMAX-Exchange/disruptor
High Performance Inter-Thread Messaging Library
keon/awesome-nlp
:book: A curated list of resources dedicated to Natural Language Processing (NLP)
DataTalksClub/mlops-zoomcamp
Free MLOps course from DataTalks.Club
StarRocks/starrocks
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
apache/seatunnel
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
apache/pinot
Apache Pinot - A realtime distributed OLAP datastore
apache/calcite
Apache Calcite
shd101wyy/markdown-preview-enhanced
One of the 'BEST' markdown preview extensions for Atom editor!
apache/incubator-streampark
Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
apache/linkis
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
awslabs/deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
apache/paimon
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
ivan-bilan/The-NLP-Pandect
A comprehensive reference for all topics related to Natural Language Processing
sodadata/soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
pinterest/querybook
Querybook is a Big Data Querying UI, combining collocated table metadata and a simple notebook interface.
OpenLineage/OpenLineage
An Open Standard for lineage metadata collection
Netflix/mantis
A platform that makes it easy for developers to build realtime, cost-effective, operations-focused applications
substrait-io/substrait
A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
kwai/blaze
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
zhenlohuang/awesome-distributed-systems
A curated list of awesome distributed systems books, papers, resources and shiny things.
iterative/mlem
🐶 A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day🤞
opendatadiscovery/awesome-data-catalogs
📙 Awesome Data Catalogs and Observability Platforms.
ColinEberhardt/awesome-public-streaming-datasets
A list of free datasets that provide streaming data
jzillmann/jmh-visualizer
Visually explore your JMH Benchmarks
DeepHiveMind/gateway_to_DeepReinforcementLearning_DeepNN
:trophy: Welcome to the wonderland of "AI" = f(DL, RL, DRL, ML, NLP, KG, MLOPS)