brendan-duncan/archive

[Question] Support for XFile stream.

Opened this issue · 7 comments

With the current implementation, opening a huge archive file on the web requires loading the whole file into the RAM, which is not ideal. The issue is apparently solved on the native platforms via InputFileStream from archive:io. However with the web it's not straightforward, but seems possible.

I did some search and the File class in the web and native implementations are a different entity. So, the Flutter team created a XFile aka cross_file to make it easier to handle files across platforms and t seems to support randoms reads with the following method

Stream<Uint8List> openRead([int? start, int? end])

However, you might notice that the reads are async as constrained by the web standard to prevent blocking on the main thread. I'm trying to implement this by imitating InputFileStream. Do you think it's possible or foresee any issues with that?
@brendan-duncan

I don't know anything about XFile, but it sounds promising. If you can get it to work, that would be fantastic.

@brendan-duncan Hopefully! I'm stuck looking for workaround for turning async read from the browser into readIntoSync required by FileHandle.

A note about that. One of the projects I'm working on is rewriting this library, because it's old and could use a fresh redesign. One of the experiments I did that didn't work out was to use async for all file IO.

InputFileStream uses buffered input, meaning when you a read method, it pulls the byte out of a buffer and if the buffer doesn't have enough byes, it fills the buffer from the file, because file IO is very slow.

My experiment was to make that "fillBuffer" use async file io. But that meant making the read* methods async, because they could possibly call the async fillBuffer. But that meant any place calling a read method, had to call await and themselves be async. I sorted through all that and got everything to be async and awaiting as necessary.

The problem is that performance dropped by about 10x. A 250k zip file took 20 seconds to decode, whereas it took about a second the old synchronous way. In Dart, await creates little micro threads that it has to start up and shut down. So at the frequency of calling await for every readByte method, it killed Dart's performance. I had to remove all that from my work-in-progress redesign.

So if you can find a a way to do async file reads, at a level where you can do buffered reads and not drastically hurt performance, I'm very interested in hearing what you come up with. I'm not a Dart expert, I just have these libraries from a long time ago, so I'm interested in figuring out what I was doing wrong and how to improve it.

I was just about to try implement it to see if async reads would impact performance. I'm no expert either, but it might be a good idea if we consult somebody from the dart team? I also wonder if the performance would improve with Dart 3 with new optimizations, but I'm not as optimistic about it because async/random reads is generally slower sync/sequential reads.

Since, async reads (on the main thread) is only a browser limitation, I've decided to implement Decoders with JsPackages instead as a workaround for now. Once I have more time to learn more about WebWorker, I'll probably and should implement a InputFileStream for the web target.

Just an update, I've managed to use JsZip as a backend and it manages to decode a 1.5 GB zip file with 2-3 seconds in the browser (which is good enough in my use case).

I did talk with a Dart developer, but there was no magic optimization in the way that I was doing things.

In the redesign I'm working on (slow progress, too many projects and work), I had some thoughts on re-designing the whole stream class to reduce the number of needed await calls while still allowing the file io to be async, but I haven't gotten to implementing that yet as an experiment.

I'm glad you found a solution for your situation.

Is it on another branch I might be able to look at?