/streamdal

Code-Native Data Pipelines

Primary LanguageTypeScriptApache License 2.0Apache-2.0

GitHub Discord

Streamdal is an open-source 'Code Native Data Pipeline' solution for running data pipelines directly in your application code.

It is at least 10x faster, 10x cheaper and 10x easier to operate than traditional data pipelines.


This is what it looks like:

OverviewBenefitsDemoGetting StartedHow Does It Work?CommunityResources

Benefits

There are major benefits to running pipelines directly within your app:

  • Eliminates the need for a separate data pipeline infrastructure
    • Pipelines execute from within your app, using existing compute that your app is already using
  • Eliminates the need for a separate data pipeline team
    • No more waiting for the data pipeline team to make pipeline changes
  • Is ridiculously fast
    • Streamdal uses Wasm to execute pipelines at near-native speeds
  • Is actually real-time
    • Not "near real-time" or "max-30-seconds-real-time" - but actually real-time - data is processed as soon as your app reads or writes data
  • And many other reasons

Live Demo

You don't have to install the server, the console or instrument any of your apps to see Streamdal in action. We've got a live demo :)

The demo showcases:

  1. Real-time Data Transformation
    • View how the Mask dot com emails pipeline for welcome service obfuscates recipient field if the field contains .com
  2. Real-time Masking & Obfuscation
    • Observe the Scrub ICMP pipeline mask object.ipv4_address if object.protocol equals "ICMP"
  3. Improved Debug Via "tail-like" UI
    • Select any Component and start Tail request to watch live output
  4. Real-time Performance Insights
    • Watch the read/write-per-second throughput for any component
  5. See All Services in Your Stack
    • Use the service map (data graph) UI to get a bird's eye view of all services and components and how they relate to each other.
    • The data graph is the "node map" of all services and components
  6. Real-time Schema Inference
    • All consumers and producers automatically infer the schemas of the payloads they read and write.
    • View the inferred schema to catch any unexpected changes, ensure data quality or just get a better understanding of the data your services are reading and writing.

You can read more about how this is achieved in the "how does it work?" docs.

Getting Started

Getting started consists of two steps:

  1. Installing the server, console and their dependencies
  2. Instrumenting your code with one of our SDKs

Install

The easiest way to get Streamdal running is via curl | bash:

curl -sSL https://sh.streamdal.com | bash
  1. The install script will verify that you have git, docker and docker-compose installed
  2. The install script will clone this repo to ~/streamdal
  3. The install script will bring up all components via docker-compose

Once done:

🎉 Openhttp://localhost:8080 in your browser! 🎉

You should be presented with a beautiful (but empty) UI! To populate it, we will need to instrument some code. Onto the next section!

For alternative installation methods, check the docs dir.

Instrument

Once you've installed the server and console, you can instrument your code using one of our SDKs.

To see an example of a complete instrumentation, take a look at the Go demo client that is bundled with the ./apps/server.

Go

package main

import (
   "context"

   "github.com/streamdal/go-sdk"
)

func main() {
   sc, _ := streamdal.New(&streamdal.Config{
      ServerURL:       "streamdal-server.svc.cluster.local:8082",
      ServerToken:     "1234",
      ServiceName:     "billing-svc",
      ShutdownCtx:     context.Background(),
   })

   resp, _ := sc.Process(context.Background(), &streamdal.ProcessRequest{
      OperationType: streamdal.OperationTypeConsumer,
      OperationName: "new-order-topic",
      ComponentName: "kafka",
      Data:          []byte(`{"object": {"field": true}}`),
   })
}

Python

from streamdal import (OPERATION_TYPE_CONSUMER, ProcessRequest, StreamdalClient, StreamdalConfig)

client = StreamdalClient(
   cfg=StreamdalConfig(
      service_name="order-ingest",
      streamdal_url="streamdal-server.svc.cluster.local:8082",
      streamdal_token="1234",
   )
)

res = client.process(
   ProcessRequest(
      operation_type=OPERATION_TYPE_CONSUMER,
      operation_name="new-order-topic",
      component_name="kafka",
      data=b'{"object": {"field": true}}',
   )
)

Node.js

import { OperationType, Streamdal } from "@streamdal/node-sdk";

export const example = async () => {
  const streamdal = new Streamdal({
    streamdalUrl: "localhost:8082",
    streamdalToken: "1234",
    serviceName: "test-service-name",
    pipelineTimeout: "100",
    stepTimeout: "10",
  });

  const result = await streamdal.processPipeline({
    audience: {
      serviceName: "test-service",
      componentName: "kafka",
      operationType: OperationType.PRODUCER,
      operationName: "kafka-producer",
    },
    data: new TextEncoder().encode(JSON.stringify({ key: "value" })),
  });
};

Important

These are basic, minimal examples and should NOT be used in production code.

Refer to the instrumentation docs for more thorough directions.

How Does It Work?

Streamdal consists of three main components:

The basic flow is that you install the server and console and wrap any reads or writes in your app with one of our SDKs. Once that's done, you will be able to see the app and the data your app is reading or writing in the console (or use the CLI).

You will also be able to enforce rules on your data (such as "this should be valid JSON", "message should contain a field called foo", "strip all email addresses" and so on).

Important

For a more in-depth explanation of the flow and the various components, visit our docs.

Repo Layout

This repo is a monorepo that has the following layout and usage:

# ┌── assets                 <--- Static assets 
# │   ├── img
# │   └── ...
# ├── apps
# │   ├── cli                <--- CLI UI 
# │   ├── console            <--- Web-based UI
# │   ├── docs               <--- https://docs.streamdal.com 
# │   ├── server             <--- Server component
# │   └── ...
# ├── docs
# │   ├── install
# │	│    ├── bare-metal
# │	│    ├── docker
# │	│    └── ...
# |   ├── instrument
# |   └── ...
# ├── libs
# │   ├── protos             <--- Common protobuf schemas
# │   ├── wasm               <--- Wasm funcs used in pipeline steps
# │   ├── wasm-detective     <--- Wasm lib used for data parsing and validation 
# │   ├── wasm-transformer   <--- Wasm lib used for data transformation
# │   └── ...
# ├── scripts                   
# │   ├── install
# │   │	  └── install.sh     <--- Install script for installing Streamdal
# │   └── ...
# ├── LICENSE
# ├── Makefile               <--- Makefile with common tasks; run `make help` for more info
# └── README.md

Community

We're building Streamdal in the open and we'd love for you to join us!

Join our Discord!

Resources

Getting Help

Stuck? Something not working right? Have questions?

  • First and easiest way to get help is to join our Discord
  • If you're not in the mood to chat - there's docs
  • If all else fails, open an issue!

Manifesto

To get a better understanding of what we're trying to do, our ethos, principles, and really, our state of mind - check out our manifesto.

Roadmap

You have control over what we're building - our roadmap is 100% public!

Feel free to stop by to discuss features, suggest new ones or just keep an eye on what's in the pipeline.

Contributing

We ❤️ contributions! But... before you craft a beautiful PR, please read through our contributing docs.

License

This project is licensed under the Apache-2.0 license.

See the LICENSE file for more info.