streamlet-dev/tributary

Hello from streamz

martindurant opened this issue · 5 comments

I see in #15 that you are already aware of us. https://streamz.readthedocs.io

We should compare notes and hopefully come up with something beneficial for all. I should note that streamz also has inputs from sockets/http/kafka and outputs to real-time graphics.

@martindurant happy to, this library is born out of a rejection of this notion, a dislike of the way time is handled in rxpy, as well as a desire to represent different types of "streams" in different ways (e.g. leverage the same code to create a forward-propagating stream, a lazy DAG, a currying pipeline, etc). My event loop section is empty because why re-invent the loop when streamz already does this well.

Given so much overlap, what do you think are the main things that streamz can learn from you, and would you be interested in contributing?

forward-propagating stream, a lazy DAG, a currying pipeline

I would appreciate more details on the meaning and differences between these.

its all dataflow in different forms. in my world there are 3 kinds

  • asynchronous/reactive, which is where inputs "tick" and push their data downstream into calculations/charts/outputs
  • pipelines (which is sugar for currying) where you basically stack functions with callbacks
  • lazy, which look like reactive on the surface, but instead of pushing data forward as things "tick", downstream fetches new data from upstream only if the data has changed

Github isn't rendering them for me but there are examples for each

Hm... I'll certainly look into those. Before having done that, it does sound an awful lot like territory that's completely covered by streamz:

  • "normal" nodes, which call their children when they are called; the call can be async (i.e., it may be delayed until the next event loop cycle) or not
  • time-based nodes, which are async with the event loop in-thread or not; they typically either make use of time to gather events for aggregation, or for polling a data-provider

In addition streamz cares about dataframe operations and connecting with Dask (which you can see as a different form of async, where the work is offloaded out of thread/process, usually in batches). It also has sinks to various outputs, including responsive graphics and a functional way to compose the computation graph, along with easy-to-make aggregation classes. I'm not trying to "sell" streamz here, just understand where we stand - which I will better do after I've had a look at your examples.
Certainly, streamz' documentation could use a lot of improvement too, so you may be facing the same problem :)

going to close this issue for now. I've completely rewritten the functionality of the streaming side of things. I think that under the right conditions, this library would easily be incorporated into another, but looking at the main candidates streamz, rxpy, and aiostream, all of them suffer one fundamental, non-technical problem.

I want to be able to do something like x + y + 5, or better yet be able to write a mathematical expression and generate a streaming graph with pluggable input and output adapters. I want the UX to be as simple as just using standard operators. I don't want the user to have to worry whether their input is a function, a generator, an async function, etc. And I want the forward propagation to mimic a complex event processor, so the user's mental model as they chain operators to be consistent.

x = t.KafkaSource(...)
window = t.Window(t.Apply(x, my_transform) + 10) / 25 
t.run(window)
graph = t.symbolic.from_equation('5 x**2 + 7xyz - x')
out = t.FileSink(graph(x=t.File(...), y=t.Const(15), z=t.WebsocketSource(...)))
t.run(out)

I think streamz gets pretty close to this, and now that my streaming side relies entirely on the asyncio event loop, theres not much difference in the engineering underneath. Looking now after having rewritten the entire codebase its even more similar:

https://github.com/python-streamz/streamz/blob/master/streamz/core.py#L184
https://github.com/timkpaine/tributary/blob/master/tributary/streaming/base.py#L59

This library has more than just streaming (the lazy graph stuff stands on its own), but I'm happy to contribute some of the operators and functionality into streamz if it seems possible to do so and it is amenable to the maintainers.