CasperKoning/kafka-streams-fan-in-example

Fanout example

Opened this issue · 2 comments

DXist commented

Thank you for such great fan-in example!

I have a couple of questions:

  1. How does filtering semantics work? Does the example produce an aggregated result as long as all required parts are available?
  2. Do you think Kafka Streams is convenient for implementing fan-out pattern with routing rules? I'm looking for alternative to header exchange in RabbitMQ that allows to route given message to multiple output queues based on message attributes.

Hi!

  1. Thanks for bringing up this question, as this made me realise there is a bug in this project, which is directly related to your question. Kafka Streams has slightly different semantics for .filter when you compare KStreams and KTables:
    • When you filter on a KStream, the behavior is what you would naively expect, i.e. the KStream that is returned by the filter is a stream that only contains those messages that satisfied the filter's predicate.
    • filters on a KTable work as what I like to call a 'deletion stage'. When the filter's predicate returns false, the aggregate for the related key will be deleted (by producing null on the compacted changelog topic).

Sometimes, like in this example project, you want to have the KStreams semantics while dealing with a KTable. As far as I know (but I may do some more research), the only way you can deal with that is to just do kTable.toStream.filter(...).

  1. Simple fan-outs are quite easy to pull off in Kafka Streams with the branch operator:
val kStream: KStream[K, V] = ???
val predicates: Seq[(K, V) => Boolean] = ???
val branches: Array[KStream[K, V] = kStream.branch(predicates: _*)

here the branches array contains sub-streams that only contain messages that satisfied the corresponding predicate in predicates (entries are matched by index, and the predicates are evaluated in a short-circuiting fashion).

Depending on your needs, you may need to introduce some more complexity and nested branches. For example when you want messages to show up in multiple sub-streams, you may have to make some combination with filter. I feel the most relevant use cases are quite easily covered with these abstractions though.

DXist commented

@CasperKoning , thank you for detailed reply!

Meantime a found that Confluent released KSQL, that should be an option for simple streaming scenarios.