Fanout example
Opened this issue · 2 comments
Thank you for such great fan-in example!
I have a couple of questions:
- How does filtering semantics work? Does the example produce an aggregated result as long as all required parts are available?
- Do you think Kafka Streams is convenient for implementing fan-out pattern with routing rules? I'm looking for alternative to header exchange in RabbitMQ that allows to route given message to multiple output queues based on message attributes.
Hi!
- Thanks for bringing up this question, as this made me realise there is a bug in this project, which is directly related to your question. Kafka Streams has slightly different semantics for
.filter
when you compareKStreams
andKTables
:- When you
filter
on aKStream
, the behavior is what you would naively expect, i.e. theKStream
that is returned by thefilter
is a stream that only contains those messages that satisfied thefilter
's predicate. filter
s on aKTable
work as what I like to call a 'deletion stage'. When thefilter
's predicate returns false, the aggregate for the related key will be deleted (by producingnull
on the compacted changelog topic).
- When you
Sometimes, like in this example project, you want to have the KStreams
semantics while dealing with a KTable
. As far as I know (but I may do some more research), the only way you can deal with that is to just do kTable.toStream.filter(...)
.
- Simple fan-outs are quite easy to pull off in Kafka Streams with the
branch
operator:
val kStream: KStream[K, V] = ???
val predicates: Seq[(K, V) => Boolean] = ???
val branches: Array[KStream[K, V] = kStream.branch(predicates: _*)
here the branches
array contains sub-streams that only contain messages that satisfied the corresponding predicate in predicates
(entries are matched by index, and the predicates are evaluated in a short-circuiting fashion).
Depending on your needs, you may need to introduce some more complexity and nested branches. For example when you want messages to show up in multiple sub-streams, you may have to make some combination with filter
. I feel the most relevant use cases are quite easily covered with these abstractions though.
@CasperKoning , thank you for detailed reply!
Meantime a found that Confluent released KSQL, that should be an option for simple streaming scenarios.