dataops

There are 151 repositories under dataops topic.

  • cleanlab/cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

    Language:Python8.9k85344686
  • flyte

    flyteorg/flyte

    Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

    Language:Go5.1k2603k546
  • console

    redpanda-data/console

    Redpanda Console is a developer-friendly UI for managing your Kafka/Redpanda workloads. Console gives you a simple, interactive approach for gaining visibility into your topics, masking data, managing consumer groups, and exploring real-time data with time-travel debugging.

    Language:Go3.6k45542334
  • lancedb/lance

    Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

    Language:Rust3.4k39812179
  • whylabs/whylogs

    An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈

    Language:Jupyter Notebook2.6k32422118
  • lensesio/fast-data-dev

    Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, Landoop Tools, 20+ connectors

    Language:Shell2k50128329
  • elementary

    elementary-data/elementary

    The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

    Language:HTML1.8k9497152
  • alibaba/SREWorks

    Cloud Native DataOps & AIOps Platform | 云原生数智运维平台

    Language:Java1.7k5360387
  • meltano/meltano

    Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

    Language:Python1.6k136.7k145
  • TobikoData/sqlmesh

    Efficient data transformation and modeling framework that is backwards compatible with dbt.

    Language:Python1.4k18434116
  • tis

    datavane/tis

    Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI

    Language:Java91042278205
  • raystack/optimus

    Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.

    Language:Go74018268153
  • tenzir

    tenzir/tenzir

    Open source security data pipelines.

    Language:C++62235085
  • awesome-data-catalogs

    opendatadiscovery/awesome-data-catalogs

    📙 Awesome Data Catalogs and Observability Platforms.

  • taivop/awesome-data-annotation

    A list of tools for annotating data, managing annotations, etc.

  • Azure-Samples/modern-data-warehouse-dataops

    DataOps for the Modern Data Warehouse on Microsoft Azure. https://aka.ms/mdw-dataops.

    Language:Shell54455306438
  • polyaxon/traceml

    Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

    Language:Python493141443
  • versatile-data-kit

    vmware/versatile-data-kit

    One framework to develop, deploy and operate data workflows with Python and SQL.

    Language:Python4141694754
  • flowerfine/scaleph

    Open data platform based on Kubernetes. Scaleph supports SeaTunnel、Flink and Doris backended by SeaTunnel on Flink engine、Flink Kubernetes Operator and Doris operator.

    Language:Java3421019299
  • raystack/firehose

    Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.

    Language:Java314154452
  • pbi-tools/pbi-tools

    Power BI DevOps & Source Control Tool

    Language:C#3012329353
  • titan

    Titan-Systems/titan

    Snowflake infrastructure-as-code. Provision environments, automate deploys, CI/CD. Manage RBAC, users, roles, and data access. Declarative Python Resource API. Change Management tool for the Snowflake data warehouse.

    Language:Python300101218
  • merantix-momentum/squirrel-core

    A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

    Language:Python28116118
  • raystack/dagger

    Dagger is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processing of real-time streaming data.

    Language:Java257166241
  • aws-ddk

    awslabs/aws-ddk

    An open source development framework to help you build data workflows and modern data architecture on AWS.

    Language:TypeScript247915220
  • raystack/frontier

    Frontier is an all-in-one user management platform that provides identity, access and billing management to help organizations secure their systems and data. (Open source alternative to Clerk)

    Language:Go239147730
  • raystack/stencil

    Stencil is a schema registry that provides schema management and validation dynamically, efficiently, and reliably to ensure data compatibility across applications.

    Language:Go217323941
  • raystack/raccoon

    Raccoon is a high-throughput, low-latency service to collect events in real-time from your web, mobile apps, and services using multiple network protocols.

    Language:Go189142329
  • meteor

    raystack/meteor

    Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog.

    Language:Go174915239
  • garystafford/tickit-data-lake-demo

    Resources for video demonstrations and blog posts related to DataOps on AWS

    Language:Python1585395
  • lensesio/lenses-docker

    ❤for real-time DataOps - where the application and data fabric blends - Lenses

    Language:Shell151121923
  • gojekfarm/beast

    [Deprecated] Load data from Kafka to any data warehouse. BQ sink is being supported in Firehose now. https://github.com/odpf/firehose

    Language:Java147252423
  • raptor-ml/raptor

    Transform your pythonic research to an artifact that engineers can deploy easily.

    Language:Go143318311
  • google/space

    Unified storage framework for the entire machine learning lifecycle

    Language:Python141937
  • raystack/guardian

    Guardian is universal data access management tool with automated access workflows and security controls across data stores, analytical systems, and cloud products.

    Language:Go1351012718
  • datachecks

    waterdipai/datachecks

    Open Source Data Quality Monitoring.

    Language:Python13126018