nikic/iter

Items lost when chunking heterogeneous input pages

NathanBaulch opened this issue · 3 comments

Relatively new to the library but enjoying it so far. However I've run into a chunking problem when working with heterogeneous pages of flattened data. Any time a chunk crosses an input data page boundary, items are lost because the integer keys overlap.

Here's a working example:

$data = [['a', 'b'], ['c', 'd', 'e'], ['f', 'g', 'h', 'i']];
$flat = \iter\flatMap(function ($x) { return $x; }, $data);
foreach (\iter\chunk($flat, 3, true) as $chunk) {
    foreach ($chunk as $key => $val) {
        print "$key: $val\n";
    }
}

Output:

0: c
1: b
1: d
2: e
0: f
1: g
2: h
3: i

Notice that a is missing and b and c are out of order. This is because both a and c share the same 0 key within the first chunk.

My workaround is to introduce a preserveKeys argument on the chunk function:

function chunk($iterable, $size, $preserveKeys = false) {
    //snip
    foreach ($iterable as $key => $value) {
        if ($preserveKeys) {
            $chunk[$key] = $value;
        } else {
            $chunk[] = $value;
        }
        //snip
    }
    //snip
}

Is there a better way to handle this?

Perhaps a chunkWithKeys function that follows the toArray/toArrayWithKeys pattern? Or a chunkWithoutKeys function to avoid breaking the existing chunk function?

nikic commented

Yeah, this is a problem. For iterator to array conversions we should not be keeping keys by default, as this causes issues very often. I like your suggestion of adding a $preserveKeys flag defaulting to false (even if this is a minor BC break).

nikic commented

Closed by #42.