Aggregation in Java combines different partition keys to one request
Opened this issue · 1 comments
mustafaakin-atl commented
To test my understanding of the library, I created the following code:
RecordAggregator aggregator = new RecordAggregator();
byte[] data1 = "wow".getBytes();
byte[] data2 = "oh-yeah".getBytes();
AggRecord rec1 = aggregator.addUserRecord("key-1", data1);
AggRecord rec2 = aggregator.addUserRecord("key-2", data2);
AggRecord g1 = aggregator.clearAndGet();
AggRecord g2 = aggregator.clearAndGet();
So I assumed it would create different aggregations for different partition keys. However, it does not work like that.
System.out.println(g1.getPartitionKey()); <-- key-1
System.out.println(g1.getNumUserRecords()); <--key- 2
System.out.println(g2); <-- null
System.out.println(g1.toPutRecordsRequestEntry().getPartitionKey()); <-- key-1
I get the with random partition keys, we would use the benefits because nothing can be aggregated. However, with the above code, the entry with key-2
partition would land in key-1
which might be undesirable for applications that want to work on a single partition.
IanMeyers commented
Yes, this is the expected behaviour, which you can modify with the addition of the ExplicitHashKey
value. In general, to achieve higher compression rates common partitions keys will be grouped by KPL unless EHK is supplied.