[ HSL ] Dict\chunk() behaves differently from doc block when given a KeyedTraversable with duplicate keys
lexidor opened this issue · 0 comments
Describe the bug
When given a KeyedTraversable with duplicate keys, Dict\chunk() will return chunks smaller than the chunk size. Some (but not all) duplicate keys will be lost.
* Returns a vec containing the original dict split into chunks of the given
* size.
*
* If the original dict doesn't divide evenly, the final chunk will be
* smaller.
The vec will not contain all the elements of the KeyedTraversable ("original dict") and the chunks will not be of the "given size".
Standalone code, or other way to reproduce the problem
function keyed_traversable_with_duplicate_keys(
): KeyedTraversable<string, int> {
// a => 0, b => 1, c => 2, a => 3, b => 4, c => 5, ...
for ($i = 0; $i < 15; $i += 3) {
yield 'a' => $i;
yield 'b' => $i + 1;
yield 'c' => $i + 2;
}
}
<<__EntryPoint>>
function main(): void {
echo \json_encode(\HH\Lib\Dict\chunk(keyed_traversable_with_duplicate_keys(), 5));
// [{"a":3,"b":4,"c":2},{"c":8,"a":9,"b":7},{"b":13,"c":14,"a":12}]
}
Steps to reproduce the behavior:
- Run the repro listed above
Expected behavior
There is no "correct" behavior for a KeyedTraversable with duplicate keys that makes much sense.
-
Consuming items until the accumulating dict is 5 elements long would result in
[{"a":12,"b":13,"c":14}]
. -
The current behavior silently drops elements from the provided KeyedTraversable. The elements that are dropped depend on the requested size. The order of the sub dicts differs from the order of the supplied KeyedTraversable.
My preferred solution would be to restrict the input type to KeyedContainer<Tk, Tv>
. KeyedContainers can not have duplicate keys. This forces callers with a KeyedTraversable to call the function like this:
Dict\chunk(dict(keyed_traversable_with_duplicate_keys()), 5);
The loss of data will be more reasonable this way. Later values of duplicate keys overwrite previous occurrences. The order of the elements is maintained. The sub dicts will all be of the correct size, except for the trailing dict.
In order to support users who depended on the current behavior, I'd move the current implementation to the Legacy_FIXME
namespace.
Actual behavior
[{"a":3,"b":4,"c":2},{"c":8,"a":9,"b":7},{"b":13,"c":14,"a":12}]
Environment
- Operating system
'Ubuntu 18.04'
- Installation method
'apt-get with dl.hhvm.com repository'
- HHVM Version
HipHop VM 4.169.0 (rel) (non-lowptr)
Compiler: 1663640616_852546864
Repo schema: ce4145c518e16986e860178941c05dc9f11eab61
Additional context
This behavior has existed since the first public release of the hsl.
/**
* Returns a vec containing the original dict split into chunks of the given
* size. If the original dict doesn't divide evenly, the final chunk will be
* smaller.
*/
function chunk<Tk, Tv>(
KeyedTraversable<Tk, Tv> $traversable,
int $size,
): vec<dict<Tk, Tv>> {
invariant($size > 0, 'Chunk size must be positive.');
$result = vec[];
$ii = 0;
foreach ($traversable as $key => $value) {
if ($ii % $size === 0) {
$result[] = dict[];
}
$result[(int)($ii / $size)][$key] = $value;
$ii++;
}
return $result;
}
/**
* Returns a vec containing the original dict split into chunks of the given
* size.
*
* If the original dict doesn't divide evenly, the final chunk will be
* smaller.
*
* Time complexity: O(n)
* Space complexity: O(n)
*/
function chunk<Tk as arraykey, Tv>(
KeyedTraversable<Tk, Tv> $traversable,
int $size,
)[]: vec<dict<Tk, Tv>> {
invariant($size > 0, 'Expected positive chunk size, got %d.', $size);
$result = vec[];
$ii = 0;
$chunk_number = -1;
foreach ($traversable as $key => $value) {
if ($ii % $size === 0) {
$result[] = dict[];
$chunk_number++;
}
$result[$chunk_number][$key] = $value;
$ii++;
}
return $result;
}