awslabs/kinesis-aggregation

Aggregation in Java combines different partition keys to one request

Opened this issue · 1 comments

To test my understanding of the library, I created the following code:

    RecordAggregator aggregator = new RecordAggregator();

    byte[] data1 = "wow".getBytes();
    byte[] data2 = "oh-yeah".getBytes();

    AggRecord rec1 = aggregator.addUserRecord("key-1", data1);
    AggRecord rec2 = aggregator.addUserRecord("key-2", data2);

    AggRecord g1 = aggregator.clearAndGet();
    AggRecord g2 = aggregator.clearAndGet();

So I assumed it would create different aggregations for different partition keys. However, it does not work like that.

    System.out.println(g1.getPartitionKey());   <-- key-1
    System.out.println(g1.getNumUserRecords());   <--key- 2

    System.out.println(g2);  <-- null

    System.out.println(g1.toPutRecordsRequestEntry().getPartitionKey());  <-- key-1

I get the with random partition keys, we would use the benefits because nothing can be aggregated. However, with the above code, the entry with key-2 partition would land in key-1 which might be undesirable for applications that want to work on a single partition.

Yes, this is the expected behaviour, which you can modify with the addition of the ExplicitHashKey value. In general, to achieve higher compression rates common partitions keys will be grouped by KPL unless EHK is supplied.