/awesome-infra

A curated list of infrastructure projects and companies.

Awesome Infrastructure Awesome

A collection of awesome software infrastructure projects and companies.


Change Data Capture

  • Arcion - Arcion is a change data capture platform that enables you to stream data from your database to your data warehouse in real-time.
  • Debezium - Debezium is an open source distributed platform for change data capture.

Caches

  • ReadySet - A lightweight query cache that sits between an application and database turning SQL reads into lightning-fast lookups.

Data Lakes

  • Bauplan - A serverless lakehouse for complex data workloads.
  • OneTable - OneTable is an open source project that provides omni-directional interoperability between lakehouse table formats such as Apache Hudi, Apache Iceberg and Delta Lake.

Graph Databases

  • ArangoDB - Graph database that also works as a multimodal database supporting documents.
  • Dgraph - Dgraph is an open source, low latency, high throughput, native and distributed graph database.
  • Kuzu - Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
  • Neo4j - Neo4j is a native graph database, built from the ground up to leverage not only data but also data relationships.
  • TigerGraph - TigerGraph is a native parallel graph database platform for enterprise applications.
  • Neptune - Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets.

Key-Value Stores

  • Venice - Venice is a derived data platform providing high throughput ingestion from batch, streams, and lambda/kappa architectures, and low latency online reads, for ML feature storage, etc.

OLTP Databases

  • Neon - Serverless Postgres. Neon separates storage and compute to offer autoscaling, branching, and bottomless storage.
  • TigerBeetle - TigerBeetle is a financial accounting database designed for mission critical safety and performance to power the future of financial services.

OLAP Databases

  • Clickhouse - ClickHouse is a column-oriented database that enables its users to generate powerful analytics, using SQL queries, in real-time.
  • Doris - Apache Doris is an easy-to-use, high performance and unified analytics database.
  • Druid - Apache Druid: a high performance real-time analytics database.
  • Materialize - Materialize is a data warehouse purpose-built for operational workloads where an analytical data warehouse would be too slow, and a stream processor would be too complicated.
  • Pinot - Apache Pinot is a distributed OLAP datastore, designed to answer OLAP queries with low latency.

Search Engines

  • Quickwit - Quickwit is a cloud-native distributed search engine designed to execute powerful search and analytics queries directly on cloud storage.

Vector Stores

  • LanceDB - LanceDB is an open-source database that uses the Lance fileformat for vector-search.
  • Turbopuffer - A serverless database for low latency vector search.

Durable Execution

  • Temporal - Temporal is a microservice orchestration platform which enables developers to build scalable applications without sacrificing productivity or reliability.
  • Azure Durable Functions - Durable Functions is an extension of Azure Functions that lets you write stateful functions in a serverless compute environment.
  • Conductor - Conductor is a microservices orchestration engine from Netflix.
  • Convex - Convex is a full cloud backend designed to replace your database, server functions, backend functionality, and the interface all the way out to your application.
  • coroutine - A durable coroutine compiler and runtime library for Go.
  • durabletask-go - The Durable Task Framework is a lightweight, embeddable engine for writing durable, fault-tolerant business logic (orchestrations) as ordinary code.
  • Flawless - Flawless is an execution engine for durable computation.
  • Infinitic - Infinitic is a general-purpose framework built on Pulsar to reliably orchestrate microservices, manage distributed transactions, operates data pipelines, builds user-facing automation, etc.
  • Inngest - Inngest is the developer platform for easily building reliable workflows with zero infrastructure.
  • Laravel Workflow - Durable workflow engine that allows users to track job status, orchestrate microservices and write long running persistent distributed workflows in PHP powered by Laravel Queues.
  • LittleHorse - LittleHorse is a high-performance microservice orchestration engine that allows developers to build scalable, maintainable, and observable applications.
  • Rama - Rama is a new programming platform that combines databases and stream processing with fault-tolerant computation.
  • Resonate - Resonate is a lightweight durable execution engine made to help you keep your promises.
  • Restate - Write RPC and event handlers, and Restate makes them reliable by adding durability to invocations, promises, communication and state.

File Formats

  • GraphAR - An open source, standard data file format for graph data storage and retrieval.
  • Lance - Modern columnar data format for ML and LLMs implemented.
  • ORC - Apache ORC is a self-describing, type-aware columnar file format designed for Hadoop workloads.
  • Parquet - Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval.

Functions as a Service

  • Lambda - AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume.
  • Google Cloud Functions - Google Cloud Functions is a serverless execution environment for building and connecting cloud services. With Cloud Functions you write simple, single-purpose functions that are attached to events emitted from your cloud infrastructure and services.
  • Azure Functions - Azure Functions is a serverless compute service that lets you run event-triggered code without having to explicitly provision or manage infrastructure.
  • OpenFaaS - OpenFaaS makes it easy for developers to deploy event-driven functions and microservices to Kubernetes without repetitive, boiler-plate coding.
  • Knative - Kubernetes-based platform to build, deploy, and manage modern serverless workloads.
  • Fission - Fission is a framework for serverless functions on Kubernetes. It allows you to easily create HTTP services on Kubernetes from functions.
  • OpenLambda - OpenLambda is an Apache-licensed serverless computing project, written (mostly) in Go and based on Linux containers.
  • Wasmer Edge - Wasmer Edge allows running cloud apps easily at the Edge, scaling them like they are serverless.

Workflow

  • Airflow - Airflow is a platform to programmatically author, schedule and monitor workflows.
  • Flyte - Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
  • Kestra - Scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
  • Prefect - Prefect is a workflow management system, designed for modern infrastructure and powered by the open-source Prefect Core workflow engine.
  • Dagster - Dagster is a data orchestrator for machine learning, analytics, and ETL.

Query Engines

  • Calcite - Apache Calcite is a dynamic data management framework. It contains many of the pieces that comprise a typical database management system but omits the storage primitives.
  • Daft - Daft is a distributed query engine with a Python Dataframe API. It is built in Rust and integrates tightly with the Python ML ecosystem such as with Ray and Pytorch.
  • Data Fusion - DataFusion is a very fast, extensible query engine for building high-quality data-centric systems.
  • optd - CMU-DB's Cascades optimizer framework for query engines. Currently, optd is integrated into Apache Arrow Datafusion as a physical optimizer.
  • Substrait - A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
  • Velox - A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.

Service Mesh

  • Istio - IstIstio is an open platform for providing a uniform way to integrate microservices, manage traffic flow across microservices, enforce policies, and aggregate telemetry data.
  • Linkerd - Linkerd is an ultralight, security-first service mesh for Kubernetes. Linkerd adds critical security, observability, and reliability features to your Kubernetes stack with no code change required.

Message Brokers

  • AutoMQ - Cloud native implementations of Kafka and RocketMQ.
  • Kafka - An open-source distributed event streaming platform.
  • Pravega - Pravega is a distributed, tiered storage system for data streams. Pravega streams are durable, unbounded, cost-effective, and elastic: an ideal storage substrate for stream processing pipelines.
  • WarpStream - WarpStream is a Kafka compatible data streaming platform built directly on top of S3.

Stream Processing

  • Apache Flink - Stateful computations over bounded and unbounded data Streams.
  • Arroyo - Distributed stream processing engine, designed to make it easy for anyone to build correct, efficient, and reliable real-time data pipelines with SQL or Rust.
  • Decodable - A managed platform for stream processing and real-time ETL, powered by Apache Flink and Debezium.
  • Kafka Streams - A stateful stream processing library for Kafka.
  • Responsive - Responsive is the platform for developers building stateful reactive applications on the modern cloud. Focused on Kafka streams.
  • RisingWave - RisingWave is a distributed SQL database for stream processing. It consumes streaming data, performs incremental computations when new data comes in, and updates results dynamically. As a database system, RisingWave maintains results in its own storage so that users can access data efficiently.

Virtual Machines

  • Firecracker - Firecracker is an open source virtualization technology that is purpose-built for creating and managing secure, multi-tenant container and function-based services that provide serverless operational models.
  • gVisor - gVisor is an application kernel, written in Go, that implements a substantial portion of the Linux system surface.
  • KVM - KVM (for Kernel-based Virtual Machine) is a full virtualization solution for Linux on x86 hardware containing virtualization extensions (Intel VT or AMD-V).
  • QEMU - QEMU is a generic and open source machine & userspace emulator and virtualizer.
  • Virtualbox - VirtualBox is a powerful x86 and AMD64/Intel64 virtualization product for enterprise as well as home use.

Miscellaneous

  • Bacalhau - Compute over Data framework for public, transparent, and optionally verifiable computation.

Contributing

Contributions welcome! Read the contribution guidelines first.

License

CC0

To the extent possible under law, criccomini has waived all copyright and related or neighboring rights to this work.