data-pipelines

There are 297 repositories under data-pipelines topic.

  • public-datasets-pipelines

    Cloud-native, data onboarding architecture for Google Cloud Datasets

    Language:Python165
  • palimpzest

    palimpzest

    A System for Optimized Semantic Computation

    Language:Python144
  • smart-data-lake

    Smart Automation Tool for building modern Data Lakes and Data Pipelines

    Language:Scala124
  • didact

    The open core .NET job orchestrator that we've been missing

    Language:C#120
  • Hoptimator

    Multi-hop declarative data pipelines

    Language:Java118
  • burla

    The simplest way to run Python on lot's of computers.

    Language:TypeScript115
  • mycelial

    Move your data with ease.

    Language:Rust108
  • patterns-devkit

    Data pipelines from re-usable components

    Language:Python107
  • udacity-data-eng-proj-1

    Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3

    Language:Python90
  • datacater

    The developer-friendly ETL platform for transforming data in real-time. Based on Apache Kafka® and Kubernetes®.

    Language:JavaScript84
  • python-sdk

    Conductor OSS SDK for Python programming language

    Language:Python84
  • beneath

    Beneath is a serverless real-time data platform ⚡️

    Language:Go84
  • exospherehost

    Infra for scalable and reliable AI agents

    Language:Python79
  • didact-engine

    The REST API and execution engine for the Didact Platform.

    Language:C#78
  • Udacity-Data-Engineer-nanodegree

    Classwork projects and home works done through Udacity data engineering nano degree

    Language:Jupyter Notebook74
  • data_engineer_interview_challenges

    Found a data engineering challenge or participated in a selection process ? Share with us!

    Language:Python65
  • xvc

    A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)

    Language:Rust63
  • kenobi

    Easiest way to monitor asynchronous data pipelines

    Language:Python59
  • ops0-cli

    ops0 is an AI-powered natural language DevOps CLI native to Claude AI with ansible, terraform, kubernetes, aws, azure and docker operations in a single cli. An open-source alternative to complex DevOps workflows, manual operations, etc. 🤖 ⚡ 👉 Natural Language DevOps Automation & Troubleshooting Tool

    Language:Go57
  • uniflow

    A high-performance, extremely flexible, and easily extensible universal workflow engine.

    Language:Go53
  • CogStack-NiFi

    Building data processing pipelines for documents processing with NLP using Apache NiFi and related services

    Language:Python53
  • Udacity-Data-Engineering-Nanodgree

    Udacity Data Engineering Nanodegree Program

    Language:Jupyter Notebook52
  • ml-in-production

    The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.

    Language:Python52
  • streams-explorer

    Explore Apache Kafka data pipelines in Kubernetes.

    Language:Python46
  • spark-transformers

    Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.

    Language:Java42
  • learn-kafka-courses

    Learn the basics of Apache Kafka® from leaders in the Kafka community with these video courses covering the Kafka ecosystem and hands-on exercises.

    Language:Shell39
  • kedro-pandera

    A kedro plugin to use pandera in your kedro projects

    Language:Python36
  • AirflowDataPipeline

    Example of an ETL Pipeline using Airflow

    Language:Python36
  • tabsdata

    A Pub/Sub for Tables based data integration platform, to discover, publish, modify and consume data effortlessly.

    Language:Rust35
  • dagster-odp

    A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code

    Language:Python35
  • examples

    Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services

    Language:HCL32
  • dbt-command-center

    Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.

    Language:TypeScript30
  • debezium-platform

    An opinionated data-centric view of Debezium components

    Language:TypeScript27
  • arakat

    ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform

    Language:Python27
  • stepist

    Framework for data processing

    Language:Python27
  • demo

    A starter dbt project and synthetic claims dataset for trying out the Tuva Project.