/bytewax

A Python framework for building highly scalable dataflows.

Primary LanguageHTMLApache License 2.0Apache-2.0

Actions Status PyPI Bytewax User Guide

Bytewax is an open source Python framework for building highly scalable dataflows in a streaming or batch context.

Get started

Check out our getting started guide.

About

Bytewax lets you build Python based dataflows to process your data for augmentation, advanced analysis, machine learning and more. It is based on Timely Dataflow, which is a cyclic dataflow computational model. At a high-level, dataflow programming is a programming paradigm where program execution is conceptualized as data flowing through a series of operator based steps. Operators are the processing primitives of bytewax. Each of them gives you a “shape” of data transformation, and you give them functions to customize them to a specific task you need. The combination of each operator and their custom logic functions we call a dataflow step. You chain together steps in a dataflow to solve your high-level data processing problem.

At a high level, Bytewax provides a few major benefits:

  • The operators in Bytewax are largely “data-parallel”, meaning they can operate on independent parts of the data concurrently.
  • The ability to express higher-level control constructs, like iteration.
  • Bytewax allows you to develop and run your code locally, and then easily scale that code to multiple workers or processes without changes.
  • Bytewax can be used in both a streaming and batch context
  • Ability to leverage the Python ecosystem directly

Community

Slack Is the main forum for communication and discussion.

GitHub Issues is reserved only for actual issues. Please use the slack community for discussions.

Code of Conduct

Usage

Install the latest release with pip:

pip install bytewax

Example

Here is an example of a simple dataflow program using Bytewax:

from bytewax import Dataflow, run


flow = Dataflow()
flow.map(lambda x: x * x)
flow.capture()


if __name__ == "__main__":
    for epoch, x in sorted(run(flow, enumerate(range(10)))):
        print(x)

Running the program prints the following output:

0
1
4
9
16
25
36
49
64
81

For a more complete example, and documentation on the available operators, check out the User Guide.

For an exhaustive list of examples, checkout the /examples folder

License

Bytewax is licensed under the Apache-2.0 license.

Contributing

Contributions are welcome! This community and project would not be what it is without the contributors. All contributions, from bug reports to new features, are welcome and encouraged.



With ❤️ Bytewax