sky-uk/kafka-message-scheduler

Kafka schedule topic compaction

inanme opened this issue · 4 comments

Can you please verify that compaction takes place on schedule topic assuming that schedule topic is already created as compacted on broker side?

In our case it seems like compaction is not happening for the messages which are deleted by KMS.

when the messages are consumed with kafka-console-consumer , the pair (message and tombstone) is consumed all the time. below key is 0ec45066-495f-4ec6-8a3b-d8a52fdc1a2a

0ec45066-495f-4ec6-8a3b-d8a52fdc1a2a	42018-04-12T14:44:54.20401Z*scheduler-healthcheckH0ec45066-495f-4ec6-8a3b-d8a52fdc1a2a�{"id": "7889273e-455c-4f4d-addb-554d0463f130", "timestamp": "2018-04-12T14:44:54.204"}

0ec45066-495f-4ec6-8a3b-d8a52fdc1a2a	null

Hi Mert (@inanme),
Log compaction should be done by Kafka broker. There a couple of broker settings that are important
when you use compacted topics. Can you check your broker properties listed below and let me know what the values are?

  • log.cleaner.enable
  • log.cleaner.delete.retention.ms
  • log.cleaner.min.cleanable.ratio
  • log.cleaner.min.compaction.lag.ms

Also, can you use kafka-topics command and provide a configuration of your topic?
Some of the broker's properties can be override per topic.

We currently have the following:
log.cleaner.enable=true set on the broker.

Topic:ACCOUNT_JANITOR-SCHEDULE PartitionCount:8 ReplicationFactor:2 Configs:min.cleanable.dirty.ratio=0.01,min.compaction.lag.ms=1,delete.retention.ms=0,segment.ms=100,cleanup.policy=compact,delete Set on the topic

@inanme @mishamo I've reproduced the behaviour that your are talking about.
Steps:

  1. send 10 messages with the same key
  2. wait
  3. consume from beginning
    Result: 2 messages with the same key consumed

However, when I added some random messages with random keys, I've got one message at the end:
Steps:

  1. send 5 messages with key = A
  2. send hundreds random messages (kafka-producer-perf-test)
  3. send 5 messages with key = A
  4. send hundreds random messages (kafka-producer-perf-test)
  5. wait
  6. consume from beginning
    Result: 1 message with key = A consumed.

I believe it is correct behaviour. Compacted topic has head and tail and only tail is compacted. By adding more messages I moved messages with key=A from head to tail.

Closing this as @wojda-sky summary above explains the behaviour that was being observed.