reactphp/stream

Stream data into multiple CSV files and stream a ZIP file with these

stereomon opened this issue · 2 comments

I would like to achieve a data export from a fast-growing database table (assume more than 1m entries). Additionally, I would like to have different CSV files with different data. Imagine orders.csv and order-items.csv. I don't want to pull all the data into memory so I was thinking about streaming the data into CSV files and stream those into a ZIP archive that should be streamed as well. I can't find any documentation about how I can achieve that and I wasn't able to find out by try and error. I completely miss the possibility to stream the CSV file streams into a ZIP file stream.

Is there anything you can point me to to achieve my goal or is it simply not possible? I guess I'm too focused on something that can't work to be able to come up with other ideas...

$loop = \React\EventLoop\Factory::create();

// streamX should be replaced with somthing I can stream into a ZIP stream.
$outputStream1 = new \React\Stream\WritableResourceStream(stream1, $loop);
$csvStream1 = new \Clue\React\Csv\Encoder($outputStream1);

$outputStream2 = new \React\Stream\WritableResourceStream(stream2, $loop);
$csvStream2 = new \Clue\React\Csv\Encoder($outputStream2);

// This is what I'm looking for. I also looked into maennchen/zipstream-php but it requires to have a tmpfile
$zip->addFromStream('file1.csv', $outputStream1);
$zip->addFromStream('file2.csv', $outputStream2);

$loop->run();

foreach ($rows as $row) {
    $csvStream1->write($row);
    $csvStream2->write($row);
}

@clue I guess you can help me here. Would be nice to get your input.

clue commented

@stereomon Thanks for bringing this up, this is an interesting one!

To recap, you're trying to stream a large number of records / row data from a database into a CSV file stream into a compressed archive?

Most of this is indeed already possible, but I'm not aware of any streaming ZIP implementation that builds on top of ReactPHP at the moment. This would be completely possible and I would love to see one! (If this is a commercial project and you want me to take a look, please shoot me a mail and we'll get this sorted!)

A ZIP archive is essentially a continuous stream of (compressed) files and some meta data regarding each file. Adding a new file to an archive isn't too hard, but you're going to have a hard time streaming multiple files into a single archive concurrently.

As a starting point, you may want to use ReactPHP's ChildProcess to temporarily compress independent archive files and then combine them into a single archive file.

As an alternative and depending on your use case, you may also keep this as separate archive files or use https://github.com/clue/reactphp-zlib to stream this into two independent orders.csv.gz and orders-items.csv.gz files (GZIP != ZIP). You may also combine multiple files into a TAR archive before compressing and creating a dump.tar.gz (or dump.tgz) file once clue/reactphp-tar#2 is completed.

I hope this helps 👍