dataops
There are 159 repositories under dataops topic.
cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
flyteorg/flyte
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
lancedb/lance
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
redpanda-data/console
Redpanda Console is a developer-friendly UI for managing your Kafka/Redpanda workloads. Console gives you a simple, interactive approach for gaining visibility into your topics, masking data, managing consumer groups, and exploring real-time data with time-travel debugging.
whylabs/whylogs
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
lensesio/fast-data-dev
Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, , 20+ connectors
elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
meltano/meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
TobikoData/sqlmesh
Efficient data transformation and modeling framework that is backwards compatible with dbt.
alibaba/SREWorks
Cloud Native DataOps & AIOps Platform | 云原生数智运维平台
datavane/tis
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI
raystack/optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
opendatadiscovery/awesome-data-catalogs
📙 Awesome Data Catalogs and Observability Platforms.
tenzir/tenzir
Tenzir is the data pipeline engine for security teams.
Azure-Samples/modern-data-warehouse-dataops
DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo
taivop/awesome-data-annotation
A list of tools for annotating data, managing annotations, etc.
polyaxon/traceml
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL.
Titan-Systems/titan
Titan Core - Snowflake infrastructure-as-code. Provision environments, automate deploys, CI/CD. Manage RBAC, users, roles, and data access. Declarative Python Resource API. Change Management tool for the Snowflake data warehouse.
flowerfine/scaleph
Open data platform based on Kubernetes. Scaleph supports SeaTunnel、Flink and Doris backended by SeaTunnel on Flink engine、Flink Kubernetes Operator and Doris operator.
raystack/firehose
Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.
pbi-tools/pbi-tools
Power BI DevOps & Source Control Tool
merantix-momentum/squirrel-core
A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:
raystack/frontier
Frontier is an all-in-one user management platform that provides identity, access and billing management to help organizations secure their systems and data. (Open source alternative to Clerk)
raystack/dagger
Dagger is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processing of real-time streaming data.
awslabs/aws-ddk
An open source development framework to help you build data workflows and modern data architecture on AWS.
raystack/stencil
Stencil is a schema registry that provides schema management and validation dynamically, efficiently, and reliably to ensure data compatibility across applications.
raystack/raccoon
Raccoon is a high-throughput, low-latency service to collect events in real-time from your web, mobile apps, and services using multiple network protocols.
raystack/meteor
Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog.
garystafford/tickit-data-lake-demo
Resources for video demonstrations and blog posts related to DataOps on AWS
kelvins/awesome-dataops
:sunglasses: A curated list of awesome DataOps tools
lensesio/lenses-docker
❤for real-time DataOps - where the application and data fabric blends - Lenses
google/space
Unified storage framework for the entire machine learning lifecycle
raptor-ml/raptor
Transform your pythonic research to an artifact that engineers can deploy easily.
gojekfarm/beast
[Deprecated] Load data from Kafka to any data warehouse. BQ sink is being supported in Firehose now. https://github.com/odpf/firehose
datachecks/dcs-core
Open Source Data Quality Monitoring.