FundingCircle/jackdaw

Support for naming windowed join topics and other internal topics

kidpollo opened this issue · 3 comments

The current stable version of Kafka streams (2.3) supports naming some internal topics per KIP's.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-372%3A+Naming+Repartition+Topics+for+Joins+and+Grouping
https://cwiki.apache.org/confluence/display/KAFKA/KIP+230%3A+Name+Windowing+Joins

It seems that it is possible (in 2.3) to name some internal topics now but not windowed join topics. We actually discovered that in trunk it is implemented but not released yet. https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KStreamImpl.java#L1109

Being able to name joins is important for long lived JOIN windows so that changes in the topology dont change the internal topic name and ignore the join history upon topology restart.

In the meant time we are gettin by a custom build 2.2 of kafka with support for Named Joins.
https://github.com/FundingCircle/kafka/pull/5/files#diff-5142e1d4a6410459d6bf6df98828e5afR920-R921

And a patch to join impl functions:

(defn join-windowed
  "Combines the values of two streams that share the same key using a windowed
  inner join. Adds the `join-name` parameter, which is used to name the internal
  storage topics. Requires patched version of the kafka streams jar."
  [this-kstream other-kstream value-joiner-fn windows
   {key-serde :key-serde this-value-serde :value-serde}
   {other-value-serde :value-serde}
   join-name]
  (clj-kstream
   (.join (kstream* this-kstream)
          (kstream* other-kstream)
          (value-joiner value-joiner-fn)
          windows
          (Joined/with key-serde this-value-serde other-value-serde join-name))))

(defn left-join-windowed
  "Combines the values of two streams that share the same key using a windowed
  left join. Adds the `join-name` parameter, which is used to name the internal
  storage topics. Requires patched version of the kafka streams jar."
  [this-kstream other-kstream value-joiner-fn windows
   {key-serde :key-serde this-value-serde :value-serde}
   {other-value-serde :value-serde}
   join-name]
  (clj-kstream
   (.leftJoin (kstream* this-kstream)
              (kstream* other-kstream)
              (value-joiner value-joiner-fn)
              windows
              (Joined/with key-serde this-value-serde other-value-serde join-name))))

Supporting naming of windowed joins seems quite critical as explained above but looking into supporting other internal topic custom naming support should also be looked at.

cddr commented

Good summary of the issue. Thanks @kidpollo

Can we make sure the functions proposed above maintain the existing behavior if called without a join-name?

Yes they do I believe @99-not-out tested this by building trunk

cddr commented

Yes they do I believe @99-not-out tested this by building trunk

Not sure I made myself clear. What I mean is that users of jackdaw should continue to be able to call e.g. the join-windowed function with only 6 args (as opposed to the 7 args required by the definition above).