redpanda-data/kminion

Support producer compression in e2e feature

jutley opened this issue · 6 comments

As a rule of thumb, we always try to configure our Kafka clients to handle data compression so the extra overhead doesn't have to happen on the Kafka brokers. This currently isn't possible in Kminion, but it is supported in the underlying Kafka library. It would be nice to expose a configuration option to set the producer compression type.

weeco commented

Hey @jutley ,
can you clarify here? Franz-go should use snappy compression by default when producing. Do you see uncompressed record batches in your end2end topic?

We saw that the metric kafka_server_brokertopicmetrics_producemessageconversions_total for the kminion topic has a constant non-zero rate. Usually this is a result of the producer not compressing data and I didn't see any code around compression. The java clients default to no compression, so I assumed the same would be true for this client. I believe I see the defaults you are referring to. I'm not sure why we are seeing those message conversions then...

Looking at the rate of bytes in on that topic, the values seem to suggest a lack of compression on ingestion.

60 messages per second
138 bytes per message
60 produce requests per second (1 message per batch)
Around 12.1 KBi per second. With no overhead and no batching, we'd expect about 8.1 KBi per second

Thinking this through a bit more, kminion doesn't batch. When I manually run a single kminion message through the snappy algorithm, the result is actually a bit larger. So my metrics above don't actually provide any insight into whether the payloads are being compressed.

weeco commented

Whether or not a record batch is compressed can be inspected by looking at the record batch attributes. If you run https://github.com/redpanda-data/kowl you can inspect that. See below screenshot (in that case it is uncompressed, but I think that's because the default in franz-go has recently changed to use a compressed record batch)
screenshot 2022-04-27 at 21 44 17

However as you pointed out: The compression rate wouldn't be that great because we send single record batches rather frequently in order to measure the latencies properly.

Going to close because this isn't critical for me, and I don't have time to investigate any of the details around it.