Items lost when chunking heterogeneous input pages
NathanBaulch opened this issue · 3 comments
Relatively new to the library but enjoying it so far. However I've run into a chunking problem when working with heterogeneous pages of flattened data. Any time a chunk crosses an input data page boundary, items are lost because the integer keys overlap.
Here's a working example:
$data = [['a', 'b'], ['c', 'd', 'e'], ['f', 'g', 'h', 'i']];
$flat = \iter\flatMap(function ($x) { return $x; }, $data);
foreach (\iter\chunk($flat, 3, true) as $chunk) {
foreach ($chunk as $key => $val) {
print "$key: $val\n";
}
}
Output:
0: c
1: b
1: d
2: e
0: f
1: g
2: h
3: i
Notice that a
is missing and b
and c
are out of order. This is because both a
and c
share the same 0
key within the first chunk.
My workaround is to introduce a preserveKeys
argument on the chunk
function:
function chunk($iterable, $size, $preserveKeys = false) {
//snip
foreach ($iterable as $key => $value) {
if ($preserveKeys) {
$chunk[$key] = $value;
} else {
$chunk[] = $value;
}
//snip
}
//snip
}
Is there a better way to handle this?
Perhaps a chunkWithKeys
function that follows the toArray
/toArrayWithKeys
pattern? Or a chunkWithoutKeys
function to avoid breaking the existing chunk
function?
Yeah, this is a problem. For iterator to array conversions we should not be keeping keys by default, as this causes issues very often. I like your suggestion of adding a $preserveKeys
flag defaulting to false
(even if this is a minor BC break).