dataops

There are 206 repositories under dataops topic.

cleanlab/cleanlab
Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Language:Python11.1k 84 377874
flyteorg/flyte
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
Language:Go6.6k 253 3.5k759
lancedb/lance
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
Language:Rust5.7k 48 1.8k471
redpanda-data/console
Redpanda Console is a developer-friendly UI for managing your Kafka/Redpanda workloads. Console gives you a simple, interactive approach for gaining visibility into your topics, masking data, managing consumer groups, and exploring real-time data with time-travel debugging.
Language:TypeScript4.2k 45 645406
whylabs/whylogs
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
Language:Jupyter Notebook2.8k 31 433134
TobikoData/sqlmesh
Scalable and efficient data transformation framework - backwards compatible with dbt.
Language:Python2.7k 27 1k313
meltano/meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Language:Python2.3k 8 6.8k186
elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Language:HTML2.2k 11 657202
lensesio/fast-data-dev
Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, , 20+ connectors
Language:Shell2.1k 45 133339
alibaba/SREWorks
Cloud Native DataOps & AIOps Platform | 云原生数智运维平台
Language:Java1.9k 52 62422
datavane/tis
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI
Language:Java1.2k 39 420261
opendatadiscovery/awesome-data-catalogs
📙 Awesome Data Catalogs and Observability Platforms.
935 23 867
statespace-tech/toolfront
Design AI applications in Markdown
Language:Python78757
raystack/optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Language:Go755 15 268154
tenzir/tenzir
Tenzir is the data pipeline engine for security teams.
Language:C++705 33 0100
Azure-Samples/modern-data-warehouse-dataops
DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo
Language:Shell676 60 608510
taivop/awesome-data-annotation
A list of tools for annotating data, managing annotations, etc.
606 20 1067
polyaxon/traceml
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
Language:Python520 10 1445
Titan-Systems/titan
Titan Core - Snowflake infrastructure-as-code. Provision environments, automate deploys, CI/CD. Manage RBAC, users, roles, and data access. Declarative Python Resource API. Change Management tool for the Snowflake data warehouse.
Language:Python478 18 8541
vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL.
Language:Python470 13 94862
DataRecce/recce
The data-validation toolkit for enhanced dbt (data build tool) PR review
Language:Python432 7 9823
flowerfine/scaleph
Open data platform based on Kubernetes. Scaleph supports SeaTunnel、Flink and Doris backended by SeaTunnel on Flink engine、Flink Kubernetes Operator and Doris operator.
Language:Java392 10 194113
pbi-tools/pbi-tools
Power BI DevOps & Source Control Tool
Language:C#387 23 34477
raystack/firehose
Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.
Language:Java340 12 4463
raystack/frontier
Frontier is an all-in-one user management platform that provides identity, access and billing management to help organizations secure their systems and data. (Open source alternative to Clerk, WorkOS)
Language:Go307 11 9040
raystack/dagger
Dagger is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processing of real-time streaming data.
Language:Java277 14 6242
awslabs/aws-ddk
An open source development framework to help you build data workflows and modern data architecture on AWS.
Language:TypeScript271 12 15325
raystack/stencil
Stencil is a schema registry that provides schema management and validation dynamically, efficiently, and reliably to ensure data compatibility across applications.
Language:Go236 28 4143
raystack/meteor
Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog.
Language:Go218 8 15046
raystack/raccoon
Raccoon is a high-throughput, low-latency service to collect events in real-time from your web, mobile apps, and services using multiple network protocols.
Language:Go211 14 3331
kelvins/awesome-dataops
:sunglasses: A curated list of awesome DataOps tools
Language:Python209 8 034
garystafford/tickit-data-lake-demo
Resources for video demonstrations and blog posts related to DataOps on AWS
Language:Python181 4 3116
datachecks/dcs-core
Open Source Data Quality Monitoring.
Language:Python163 2 10322
lensesio/lenses-docker
❤for real-time DataOps - where the application and data fabric blends - Lenses
Language:Shell159 11 2024
google/space
Unified storage framework for the entire machine learning lifecycle
Language:Python155 6 38
raptor-ml/raptor
Transform your pythonic research to an artifact that engineers can deploy easily.
Language:Go154 3 18412