Async entry read causes exception when reading AES encrypted zip.
rilehudd opened this issue · 8 comments
Describe the bug
When trying to read an encrypted archive's entry's input stream asynchronously, it reports an exception containing message: "Auth code missing from input stream".
Yet, if the same archive is read synchronously it works fine.
Reproduction Code
https://dotnetfiddle.net/RhJM7o
Steps to reproduce
It doesn't seem to happen for every input. But for inputs where it does happen, changing the async call to the synchronous version fixes the problem.
The zip input is in the dotnetfiddle.
The steps to create the zip archive from 7-Zip are:
- Using the file provided (appendix A)
- 7-Zip -> Add to archive
- Ensure params:
- Archive Format: Zip
- Compression level: 5 - Normal
- Compression method: * Deflate
- Encryption method: AES-256
- Set password (I set mine to "test")
- Run the decompress/decrypt steps outlined in the dotnetfiddle
Expected behavior
For the same input, I expected them both to work (the synchronous and the asynchronous).
Operating System
Windows
Framework Version
.NET 6
Tags
ZIP, Encryption, Async
Additional context
No response
Thanks for the report and for actually using the dotnet fiddle template!
I compacted it somewhat to emphasise that the only difference is the async/sync calls: https://dotnetfiddle.net/5alzjK
ArchiveDiag report for the file (nothing stands out afaikt):
https://archivediag.piksel.se/reports/asyncaes.zip-1681140301148006.html
I have not found I cause yet, but I will look into this as soon I can.
Yeah, this is yet another time that the framework bypasses the public methods of CryptoStream
for increased performance.
ZipAESStream
overrides the Read
and ReadAsync
methods of CryptoStream
, but as you can see from the
Stacktrace
at ICSharpCode.SharpZipLib.Encryption.ZipAESTransform.TransformFinalBlock(Byte[] inputBuffer, Int32 inputOffset, Int32 inputCount) in /_/src/ICSharpCode.SharpZipLib/Encryption/ZipAESTransform.cs:line 141
at System.Security.Cryptography.CryptoStream.ReadAsyncCore(Memory`1 buffer, CancellationToken cancellationToken, Boolean useAsync)
at System.Security.Cryptography.CryptoStream.ReadAsyncInternal(Memory`1 buffer, CancellationToken cancellationToken)
at System.IO.StreamReader.ReadBufferAsync(CancellationToken cancellationToken)
at System.IO.StreamReader.ReadToEndAsyncInternal()
at Program.<<Main>$>g__Repro|0_0(String type)
at Program.<<Main>$>g__TestAsync|0_2[T](Func`2 a, T input)
, StreamReader
just bypasses ReadAsync
and calls ReadAsyncCore
directly, which bypasses our read logic. That method is internal and cannot be overridden by us.
This is essentially #572 all over again, but with no clear solution.
As for consumers (including @rilehudd), you can replace the StreamReader
with:
var entry = zip.GetEntry("abcdefgh.txt");
var entryStream = zip.GetInputStream(entry);
var entryBuffer = new byte[entry.Size];
using (var ms2 = new MemoryStream(entryBuffer))
{
byte[] buffer = new byte[81920];
int read;
while ((read = await entryStream.ReadAsync(buffer, 0, buffer.Length)) > 0)
{
ms2.Write(buffer, 0, read);
}
var output = Encoding.UTF8.GetString(entryBuffer);
// ... use output ...
}
It's not ideal, for sure, but that is the only async path that avoids the optimisations in .NET that bypasses the AES encryption logic.
Is there a way we can detect this "bypassed" state an compensate for this optimization?
For example detect that ZipAESTransform.TransformFinalBlock has been called from the non-expected path (to implement what CryptoStream is expecting). I'm may not be understanding the feasibility of attempting to compensate for the optimization. Or maybe it is possible but would result in unacceptable complexity. (I wonder what dotnet designers expected implementors to do with them bypassing public methods).
Otherwise, it seems like this might be an optimization that might likely affect many encryption stream implementations. Do you think it makes sense to submit this optimization bug upstream to dotnet?
Sorry didn't intend to close this issue (wanted to ensure you see this and have an opportunity to respond so I might see what your thoughts are).
I think the main issue is that extending CryptoStream
is an unintended way to implement decryption and I don't know why it was done this way.
It is currently unrecoverable, since part of the auth code gets consumed by TransformBlock
. The ZipAESStream
uses a sliding buffer for this that should be possible to move over to the transform.
Thanks for taking the time to take a look at this!
There is another solution to this. The compression method that is used for the entry is STORE, which means that it's uncompressed. This problem only happens to such entries because we directly return the ZipAESStream
to the consumer. If there is some decompression to be done, the stream will be wrapped in the appropriate decompression stream, and the optimized path is not taken. We could add a StoreOutputStream
that just passes the data through, but in doing so mitigates this problem.