brendan-duncan/archive

Out of memory exception

Closed this issue · 19 comments

My project recently updated to the most recent version and found that we can no longer decode our compressed files anymore without getting out of memory exceptions. We narrowed down 3.3.3 as the version in which it stops working for us. While we can't say for certain, we think one commit indroduced the changes that caused it to stop working for us and there are a few spots we suspect are most relevant:

ZipFile was updated to extend from FileContent, which declares the getter content: 3dd6a05?diff=split#diff-3c83a4c65bd2f47bc1215c6d665a3fa4c3fc857f78112bea6b897095d0acaa22

The ArchiveFile constructor was updated to, if the content was of type FileContent to invoke that content getter in said constructor. 3dd6a05?diff=split#diff-8cf025b6f411b88d1e7911cb07765976f4f5f284b634342eeab10b342dbf75c9

What we think the problem is, is that its all getting decompressed right away rather than when its first accessed. While I can't provide the compressed file itself, for scale its just under 6MB compressed.

Also for refereance, this is what the method we're currenlty using looks like:
List readZip(final BigEndianBinaryReader reader) {
List files = [];
Uint8List zipData = reader.readBytes();
Archive archive = ZipDecoder().decodeBytes(zipData.toList());
for (final ArchiveFile file in archive.files) {
files.add(ByteData.view(Uint8List.fromList(file.content as List).buffer));
}
return files;
}

Any help or a work around would be much appreciated.

Thanks for letting me know. I've been doing a lot of work with memory and streaming recently, I must have messed something up, it should have reduced memory. I'm looking into this.

Oh, except 3.3.3 was a long time ago...that wouldn't be related to my recent changes. I'm still looking into it.

6MB compressed isn't that large. I've tested significantly larger.
What platform are you running on? Flutter? Web?

I ran through the code and it's not getting decompressed in the constructor. However it is getting decompressed if the verify arg is set to true in the ZipDecoder.decodeBytes call (as in ZipDecoder.decodeBytes(data, verify: true)). But verify is false by default, so your method shouldn't be doing zip verification. That does give me an idea on how to not decode everything for doing verification though.

You are storing all the decompressed data in memory anyways, so it wouldn't matter much if it was decompressed already from decodeBytes or not. However, in your code you are duplicating the decompressed memory.

Something you could try is not duplicating the decompressed memory.

for (final file in archive.files) {
  final bytes = file.content as Uint8List; // content is a Uint8List anyways, List<int> is just the interface
  files.add(ByteData.view(bytes.buffer));  // You were duplicating the decompressed data before
}

I'm helping @RyanQuigleyOsi out.

Yes we are doing a buffer copy. Funny enough, the code you sent makes the problem worse somehow.

After digging into the code a bit more, what's happening is the zip we're getting from our server side process Archive is parsing it as having max uint 32 compressed and uncompressed size for the files contained therein. Prior to 3.3.3 this wasn't an issue, as the internal buffer in the InputStream that gets passed around is only sized to the size of the actual data extracted from the zip file, and any new InputStreams are bootstrapped/backed by the original buffers, and so are sized to the data.

On Archive 3.3.3 and above ZipFile was updated to be of type FileContent, and ArchiveFile was updated to check to see if the content it has is of type FileContent, and if so, when ArchiveFile.content is called, that getter invokes ZipFile.content. The result of this method call is to invoke Inflate.buffer passing in the uncompressed size to the output stream. Inflate.buffer then bootstraps the outputstream in this case to max uint 32.

The old path through ArchiveFile would invoke inflate_buffer.dart's inflateBuffer, which does not preallocate according to the uncompressed size.

I've got some more digging to do into whether we're sending a malformed zip from the server or not, but I figured I'd give you an update with where I'm at.

Alright, I think I've triaged it to the point where I can identify the issue at hand, but I couldn't tell you which piece of code is wrong. We're trying to process a Zip64 encoded zip, and Archive is breaking on parsing the extra field.

image

The section of code outlined is misbehaving. The way it actually functions is:

The local variable extra is an InputStream whose buffer is a view on input.buffer, with start and offset set to the location of the extra field. However, neither that invocation to readBytes that creates extra, nor the following extra.toUint8List() moves extra.offset at all. Thus, when extra.rewind(extraLen) is invoked, it moves the cursor (extra.offset) back extraLen. This means that when an attempt is made to read id from extra, it's reading something else entirely.

What I don't know is whether readBytes or toUint8List is supposed to advance the cursor, whether rewind is supposed to guard against putting offset behind start, whether extra.rewind is supposed to be omitted entirely, or something else is supposed to be done here. But that's the gist.

Let me know if there's any more clarification needed on our part.

Thanks for digging into it more. That code has been rewritten since the version you're looking at. Currently it's

      if (extraLen > 0) {
        final extraBytes = input.readBytes(extraLen);
        extraField = extraBytes.toUint8List();

        final extra = InputStream(extraField);
        while (!extra.isEOS) {
          final id = extra.readUint16();
          var size = extra.readUint16();
          final extraBytes = extra.readBytes(size);
          if (id == 1) {
            // Zip64 extended information

toUint8List does not change the read position, it returns the bytes for the remainder of the stream, so the version you're looking at is buggy.

So same basic problem, but different file. ZipFile doesn't handle the 0xFFFFFFFF case for Zip64 size, and also doesn't set the compressed or uncompressed size based on the passed in ZipFileHeader. This means that when ZipFile.content is called, an OutputStream of size 0xFFFFFFFF is being initialized.

Are you limited to the old version of the library? Because the Zip64 code has been rewritten since that older version. If you are limited to that, you would have to branch the source and cherry pick in the changes to Zip64.

That last comment was triaging 3.4.5. I don't believe we're limited to 3.3.8.

Thanks for the info, I'll make sure to dig into that. 0xFFFFFFFF indicates it's a zip64 size, and it should have been updated after it read the zip64 block. That it's not doing that is a problem with the reader, or something in the zip it's not handling correctly.

Any chance you could send me the problematic zip?

Unfortunately I cannot. If I get some time I'll see if I can generate something that will exhibit the same behavior.

If you can come up with a manufactured zip, that would be helpful so I reproduce the issue and know I fixed it. I will still be looking into it, even if you can't provide a zip, it just might take some back-and-forth to make sure I got it solved.

archive.zip

I believe this replicates the problem. I created this zip using the zip utility: zip -fz archive.zip first.txt second.txt third.txt

Note that to fully experience the problem you will need to call ZipFile.content at somepoint.

Thank you! I'll look into it.

I see the issue and am working on a fix.

It should be fixed now and I published the fix in 3.4.6. Sorry about that problem, and thanks for the report and the help to diagnose it! I have little time to work on these projects, so the help to identify it was very much appreciated, it made tracking down the issue a lot faster.

Happy to help, and thanks for the quick turnaround. I'll try 3.4.6 tomorrow and see if it fixes the issue on our end.

Looks like we're all good on our end now. Thanks again.