/beavers

Python stream processing for analytics

Primary LanguagePythonApache License 2.0Apache-2.0

PyPI Version Python Version Github Stars codecov Build Status Documentation License Downloads Downloads Code style: black

Beavers


Beavers is a python library for stream processing, optimize for analytics.

It is used at Tradewell Technologies, to calculate analytics and serve model predictions, in both realtime and batch jobs.

Key Features

  • Works in real time (eg: reading from kafka) and replay mode (eg: reading from parquet)
  • Optimized for analytics, it uses micro-batching (instead of processing records one by one)
  • Similar to incremental, it updates nodes in a dag incrementally
  • Taking inspiration from kafka streams, there are two types of nodes in the dag:
    • Stream: ephemeral micro-batches of events (cleared after every cycle)
    • State: durable state derived from streams
  • Clear separation between the business logic and the IO. So the same dag can be used in real time mode, replay mode or can be easily tested.