icsharpcode/SharpZipLib

Exception thrown (rarely) in TarHeader with empty array

DiogoRoloOS opened this issue · 5 comments

I have a unit test that goes though some SharpZipLib methods.
Every once in a while it fails in a VM which the following error:

ModTime cannot be before Jan 1st 1970 (Parameter 'value')
at ICSharpCode.SharpZipLib.Tar.TarHeader.set_ModTime(DateTime value)
at ICSharpCode.SharpZipLib.Tar.TarHeader.ParseBuffer(Byte[] header, Encoding nameEncoding)
at ICSharpCode.SharpZipLib.Tar.TarInputStream.GetNextEntryAsync(CancellationToken ct, Boolean isAsync)
at ICSharpCode.SharpZipLib.Tar.TarInputStream.GetNextEntry()

The code that is throwing that exception is the below one and is ALWAYS called with input Array.Empty():

public static byte[] Dummy(this byte[] data) {
    using var inStream = new MemoryStream(data);
    using var outStream = new MemoryStream();
    using var gzipStream = new GZipInputStream(inStream);
    using var tarInputStream = new TarInputStream(gzipStream, Encoding.UTF8);
    while (tarInputStream.GetNextEntry() is { } tarEntry) {
    }

    return outStream.ToArray();
}

The "fun" part is that this NEVER happens in my local machine. Only in the CICD pipeline, once in a while.
The CICD pipeline is a newly booted ubuntu-20.04 container on Azure Devops.

Steps to reproduce

  1. Boot a ubuntu-20.04 container (freshly booted?)
  2. Run the function
  3. Repeat until it happens

Expected behavior

A dummy empty tar.gz archive should be enumerated

Actual behavior

An exception is (rarely) thrown.

Version of SharpZipLib

1.4.0

Obtained from (only keep the relevant lines)

  • Package installed using NuGet

I don't really understand what this is supposed to accomplish... Running Dummy(Array.Empty<byte>()) should just return byte[0].

In any case, the bug is probably caused by this remark.
The array pool returns an array and when it's being used to read data from the stream, if the stream is empty, no new data is copied to the buffer.
This means that whatever data was still present in the rented array was parsed as an (invalid) Tar header.
This should be fixed by "padding" the input stream with null bytes instead of just breaking in

if (numBytes <= 0)
{
break;
}

Repro

https://dotnetfiddle.net/Drl27x

Note: Causes System.ArgumentOutOfRangeException: Cannot be less than zero (Parameter 'value') instead due to the garbage data being 0x8b, but it's still caused by the same bug.

I don't really understand what this is supposed to accomplish... Running Dummy(Array.Empty()) should just return byte[0]

This piece of code is a small snippet that resides in a bigger piece of code. The Array.Empty is just a mock to simplify the logic. I agree with you that it should always return byte[0] but sometimes, the hasHitEOF inside TarInputStream.cs is false, for some reason.

It's definitively caused by the re-use of a shared array pool array. The array is not zeroed, so when reading from an empty source it will treat the "garbage" data in the array as input.

That makes sense. Thank you for the explanation.