/lightflus

A Lightweight, Cloud-Native Stateful Distributed Dataflow Engine

Primary LanguageRustApache License 2.0Apache-2.0

Lightflus

Lightflus is a cloud-native distributed stateful dataflow engine.

CI Join the chat at https://gitter.im/lightflus/community codecov

Important Notes

  1. Lightflus has released the demo version. Welcome to try it!

  2. Your feedback is very important and we will take it very seriously!

Design Philosophy

Target Users

Lightflus is designed for most developer teams even no one is familiar with streaming computation. Any of your team member can write a dataflow task and deploy it on production. Lightflus can connect with any event source (Kafka, MQTT, etc) in your cloud infra and emit the output result into the storage sink (mysql, redis, etc) which is processed by user-defined Dataflow.

Typescript API + Rust Runtime

Lightflus is powered by Deno and implemented in Rust which can ensure memory safe and real-time performance. We embed v8 engine into Lightflus engine with minimum dependencies makes it light and fast; With the help of Deno, you can run Typescript user-defined functions or WebAssembly encoded bytes code (for better performance) in Lightflus with stream-like API;

References

Lightflus mainly refers to Google's Paper The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing and Streaming System. Some other papers in the field of streaming system are also our important source of references.

Document

You can read the document for more details about Lightflus;

Community

You can join Gitter community!

Set Up Lightflus

Start from Source

$ cargo run --manifest-path src/taskmanager/Cargo.toml
$ cargo run --manifest-path src/coordinator/Cargo.toml
$ cargo run --manifest-path src/apiserver/Cargo.toml

Start by Docker (Recommended)

$ docker-compose up

Try Lightflus

Preparation

  1. Install Node.JS environment
  2. Use WebStorm / VS Code to create a new Node.JS project
  3. intialize typescript project
    1. install typescript dependency: npm install typescript
    2. initialize tsconfig.json:
    yarn tsc -p .
  4. install lightflus-api dependency:
    npm i lightflus-api

Deploy Your first Lightflus Task

We use word count as the example to show you how to deploy a Lightflus dataflow task

  1. Modify tsconfig.json

We recommand you to set up properties in tsconfig.json file like below:

{
  "compilerOptions": {
    "module": "commonjs",
    "target": "es2016",
    "sourceMap": true,
    "baseUrl": "./",
    "incremental": true,
    "skipLibCheck": true,
    "strictNullChecks": false,
    "forceConsistentCasingInFileNames": false,
    "strictPropertyInitialization": false,
    "esModuleInterop": true,
    "moduleResolution": "node"
  }
}
  1. Implement Word Count
// wordCount example

import {context} from "lightflus-api/src/stream/context";
import {kafka, redis} from "lightflus-api/src/connectors/connectors";
import ExecutionContext = context.ExecutionContext;
import Kafka = kafka.Kafka;
import Redis = redis.Redis;

async function wordCount(ctx: ExecutionContext) {
    // fetch string stream from kafka
    let source = Kafka
        .builder()
        .brokers(["kafka:9092"])
        // topic
        .topic("topic_2")
        // groupId
        .group("word_count")
        // deserialization type
        .build<string>(undefined, typeof "");

    // It will persist the counting values in Redis
    let sink = Redis.new<{ t0: number, t1: string }>()
        .host("redis")
        .keyExtractor((v) => v.t1)
        .valueExtractor((v) => v.t0.toString());

    // create a Dataflow
    let stream = source.createFlow(ctx);

    // We design the Dataflow
    await stream.flatMap(value => value.split(" ").map(v => {
        return {t0: 1, t1: v};
    }))
        .keyBy(v => v.t1)
        .reduce((v1, v2) => {
            return {t1: v1.t1, t0: v1.t0 + v2.t0};
        })
        // write the results into Redis sink
        .sink(sink)
        // Then execute
        .execute();
}

wordCount(ExecutionContext.new("wordCount", "default")).then();
  1. Compile typescript codes
$ yarn tsc -p .
  1. Set environment variables
$ export LIGHTFLUS_ENDPOINT=localhost:8080
  1. Run Javascript code after compilation
$ node wordCount.js

Make the Dataflow Run!

You can send message to Kafka

hello hello hello world world world

And you can get values in Redis

redis> GET hello
"3"

redis> GET world
"3"

Contribution

Please read CONTRIBUTING.md document