data-infrastructure
There are 40 repositories under data-infrastructure topic.
zalando/postgres-operator
Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
StructuredLabs/preswald
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, DuckDB, Pandas, and Plotly, Matplotlib, etc. Build dashboards, reports, and notebooks that run offline, load fast, and share like a document.
CrunchyData/postgres-operator
Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.
cocoindex-io/cocoindex
Data transformation framework for AI. Ultra performant, with incremental processing.
zalando/spilo
Highly available elephant herd: HA PostgreSQL cluster using Docker
tensorbase/tensorbase
TensorBase is a new big data warehousing with modern efforts.
zalando/nakadi
A distributed event bus that implements a RESTful API abstraction on top of Kafka-like queues
zalando/PGObserver
A battle-tested, flexible & comprehensive monitoring solution for your PostgreSQL databases
thedataengineeringbook/thedataengineeringbook
The Data Engineering Book - หนังสือวิศวกรรมข้อมูล ของคนไทย เพื่อคนไทย
zalando-incubator/spark-json-schema
JSON schema parser for Apache Spark
abhishek-ch/data-machinelearning-the-boring-way
Build & Learn Data Engineering,Machine Learning over Kubernetes. No Shortcut approach.
zalando-nakadi/kanadi
Kanadi is a Nakadi client for Scala
opensnowcat/opensnowcat-enrich
OpenSnowcat Enricher (Apache 2.0 License)
zalando-incubator/darty
Data dependency manager
bizzabo/elasticsearch_to_bigquery_data_pipeline
A generic data pipeline which will map Elasticsearch documents to Bigquery table rows
Jzbonner/dataengineering-db
Information relating to topics on Data Engineering, Data Infrastructure, Data Storing, Data Warehouses and Business Analysis. For those interested in both conceptual theory and use case examples for database design and development.
alphagov/consent-api
Service for sharing user consent to cookies across multiple domains
yennanliu/data_infra_repo
Collections of POC/dev data infrastructure. | #SE
anna-geller/kestra-terraform-examples
Bring Infrastructure as Code best practices to your data workflows with Kestra and Terraform
alphagov/sde-prototype-govuk
A fake GOV.UK homepage and start pages for SDE prototype services
amkrajewski/mpdd-alignn
MPDD Calculator for Atomistic Line Graph Neural Network Deployment
Noobzik/ATL-Datamart
TP d'architecture décisionnel à destination des étudiants de l'EPSI et DC Paris. Le but est de déployer une architecture data dès la récupération de la donnée vers la restitution sous la forme de dataviz en passant par un Datalake, Data Warehouse et d'un Data Mart
ablange/nix-data-mesh
A practical data mesh reference implementation, powered by open-source.
alphagov/analytics-settings-database
Export Google Analytics (GA4 and UA) settings
alphagov/sde-prototype-haas
SDE prototype dummy service - Hexagrams as a Service
ICPSR/mica-data-descriptor
Processing code for Scientific Data Descriptor paper
ilssaf/data-platform-deployer
CLI tool for automatic data platform deployment
k0rsakov/infrastructure_for_data_engineer_dbt
Инфраструктура для Data-Engineer DBT
smdoh/pipebird
Open-source API to securely share data with customers.
iTrauco/streaming-data-platform
skeleton streaming data platform on gcp...
apelullo/yelp_health_data_curation_ops
An AWS-based data pipeline to extract, process, store, and monitor Yelp "health-related" facility data in support of ongoing health system initiatives.
Corey4005/STEMNET-Daily-Files
The purpose of this repository is to create a data infrastructure that will communicate with the STEMNET server at the University of Alabama Huntsville. In particular, the goal is to give anyone the capability to create clean daily files from all available stations on linux machines.
maksimkurb/spilo
Fork of Zalando Postgres Splio with pgvecto.rs and VectorChord extensions installed (Immich-compatible)
seedcase-project/template-data-package
An opinionated template for Data Packages built with Seedcase packages.