/feldera

Feldera Continuous Analytics Platform

Primary LanguageRustOtherNOASSERTION

The Feldera Continuous Analytics Platform

License: MIT CI workflow nightly slack discord sandbox

The Feldera Continuous Analytics Platform, or Feldera Platform in short, is a fast computational engine and associated components for continuous analytics over data in motion. Feldera Platform allows users to configure data pipelines as standing SQL programs (DDLs) that are continuously evaluated as new data arrives from various sources. What makes Feldera's engine unique is its ability to evaluate arbitrary SQL programs incrementally, making it more expressive and performant than existing alternatives like streaming engines.

With the Feldera Platform, software engineers and data scientists configuring data pipelines are not exposed to to the complexities of querying changing data, an otherwise notoriously hard problem. Instead, they can express their computations as standing queries and have the Feldera Platform evaluate these queries incrementally, correctly and efficiently.

To this end we set the following high-level objectives:

  1. Full SQL support and more. Our goal is to support the complete SQL syntax and semantics, including joins and aggregates, correlated subqueries, window functions, complex data types, time series operators, UDFs, and recursive queries.

  2. Scalability in multiple dimensions. The platform scales with the number and complexity of queries, input data rate and the amount of state the system maintains in order to process the queries.

  3. Performance out of the box. The user should be able to focus on the business logic of their application, leaving it to the system to evaluate this logic efficiently.

Architecture

With Feldera Platform, users create data pipelines out of SQL programs and data connectors. An SQL program comprises tables and views. Connectors feed data to input tables in a program or receive outputs computed by views. Example connectors currently supported are Kafka, Redpanda and an HTTP API to push/pull directly to and from tables/views. We are working on more connectors such as ones for database CDC streams. Let us know of any connectors you would like us to develop.

Feldera Platform fundamentally operates on changes to data, i.e., inserts and deletes to tables. This model covers all kinds of data in-motion use cases, like insert-only streams of event, log, HTTP and timeseries data, as well as changes to traditional databases extracted via CDC streams.

The following diagram shows Feldera Platform's architecture.

Feldera Platform Architecture

What is in this repository?

This repository comprises all the buildings blocks to run continuous analytics pipelines using Feldera Platform.

  • web UI: a web interface for writing SQL, setting up connectors, and managing pipelines.
  • pipeline-manager: serves the web UI and is the REST API server for building and managing data pipelines.
  • dbsp: the core engine that allows us to evaluate arbitrary queries incrementally.
  • SQL compiler: translates SQL programs into DBSP programs.
  • connectors: to stream data in and out of Feldera Platform pipelines.

Quick start with Docker

First, make sure you have Docker Compose installed.

Next, run the following command to download a Docker Compose file, and use it to bring up a Feldera Platform deployment suitable for demos, development and testing:

curl https://raw.githubusercontent.com/feldera/feldera/main/deploy/docker-compose.yml | docker compose -f - --profile demo up

It can take some time for the container images to be downloaded. About ten seconds after that, the Feldera web console will become available. Visit http://localhost:8080 on your browser to bring it up. We suggest going through our demo next.

Our Getting Started guide has more detailed instructions on running the demo.

Running Feldera from sources

To run Feldera from sources, first install all the required dependencies. This includes the Rust toolchain (at least 1.75), Java (at least JDK 19), Maven and Typescript.

After that, the first step is to build the SQL compiler:

cd sql-to-dbsp-compiler
mvn package -DskipTests

Next, from the repository root, run the pipeline-manager:

cargo run --bin=pipeline-manager --features pg-embed

As with the Docker instructions above, you can now visit http://localhost:8080 on your browser to see the Feldera WebConsole.

Documentation

To learn more about Feldera Platform, we recommend going through the documentation.

Contributing

Most of the software in this repository is governed by an open-source license. We welcome contributions. Here are some guidelines.

Theory

Feldera Platform achieves its objectives by building on a solid mathematical foundation. The formal model that underpins our system, called DBSP, is described in the accompanying paper:

The model provides two things:

  1. Semantics. DBSP defines a formal language of streaming operators and queries built out of these operators, and precisely specifies how these queries must transform input streams to output streams.

  2. Algorithm. DBSP also gives an algorithm that takes an arbitrary query and generates an incremental dataflow program that implements this query correctly (in accordance with its formal semantics) and efficiently. Efficiency here means, in a nutshell, that the cost of processing a set of input events is proportional to the size of the input rather than the entire state of the database.