Tar file is empty with a size of zero bytes for small tar entry sizes

Question

Tar file is empty with a size of zero bytes for small tar entry sizes

HofmeisterAn opened this issue a year ago · 4 comments

Describe the bug

The attached reproducer code shows an issue that occurs when the size of a tar entry is very small, just a few bytes. When attempting to create a tarball for a collection of small files, the tar file turns out to be empty with a size of zero bytes (calling flush etc. does not help). However, if the size of the tar entry is increased, the problem does not occur and the tar file is created correctly.

Reproduction Code

https://dotnetfiddle.net/QLhzBV

Steps to reproduce

Run the .NET Fiddle reproducer

Expected behavior

The tar file should have a size greater than 0 bytes.

Operating System

Windows, macOS, Linux

Framework Version

.NET 7, .NET 6

Additional context

If you change the multiplier in the linked reproducer, for example to 10 (at line 12), the test will execute successfully. Furthermore, if the TarOutputStream does not own the underlying stream, it also works properly:

IsStreamOwner = false;
...
Close();
Assert.True(_stream.Length > 0, "The tar file has a size of zero bytes."); // Runs successfully.

Answer 1 · 2023-06-16T14:46:33.000Z

The tar data is not fully written until the TarOutputStream is closed (or when the buffer is flushed, which happens when the entry content is large enough). The .Length is the number of bytes that have been written to the output stream, which doesn't necessarily correlate to the number of bytes written to it.

Furthermore, if the TarOutputStream does not own the underlying stream, it also works properly.

Yes, that is how you should use this with a memory stream.
The underlying stream is normally closed to avoid leaking stream handles, but if you want to opt out of that you add the IsStreamOwner = false and take responsibility for disposing of the stream yourself.

Answer 2 · 2023-06-16T14:58:31.000Z

The tar data is not fully written until the TarOutputStream is closed (or when the buffer is flushed, which happens when the entry content is large enough).

I can not use a closed stream anymore (and for the open stream the data is not flushed). This becomes very difficult and inconvenient when dealing with small tar entries. Using my own stream (where the TarOutputStream is not the stream owner) can be used as a workaround, but it is inconvenient for developers and they may not even be aware of this requirement in the first place.

Answer 3 · 2023-06-16T15:06:57.000Z

I can not use a closed stream anymore (and for the open stream the data is not flushed).

I don't know what you are trying to do, but you cannot create a valid tar file without closing it, since it needs to add the EOF blocks to the end. If you are extending the TarOutputStream like you are doing in the reproduction code example, then why don't you instead use a TarOutputStream?

I also don't understand how adding IsStreamOwner = false and then using the MemoryStream would be any less convenient. It doesn't matter how large the entries are, after the EOF chunks no new entries can be added, which is why the stream is closed.

Updated .NETFiddle with suggested usage:
https://dotnetfiddle.net/cAH5F2

Answer 4 · 2023-06-17T14:27:46.000Z

since it needs to add the EOF blocks to the end.

Yes, you are correct. There is something I overlooked. For some reason, I thought that closing the stream would only add the EOF marker. However, since I need to send the stream to an HTTP endpoint, I cannot simply close it and be finished like I would usually do with a file stream.

I also don't understand how adding IsStreamOwner = false and then using the MemoryStream would be any less convenient.

By finalizing the tar stream (similar to CloseEntry) while keeping it open, I could save some lines of code. I may have been distracted by noticing that few bytes were neither written nor flushed (without closing it). By the way, that is exactly what I currently do. Thank you for your response and clarification.