adamhathcock/sharpcompress

ArchiveFactory fails with this "tar.gz" file

jbrockerville opened this issue · 5 comments

I made "archive-linux-tar.tar.gz" file with tar -czf archive.tar.gz tar.gz.txt (attachment renamed) and the ArchiveFactory contains an entry with a null Key. I think it might be using GZipArchive instead of TarArchive?

I then made "archive-linux-tar-gzip.tar.gz" with tar -cf archive.tar tar.gz.txt and gzip -5 archive.tar (attachment renamed) and that was fine.

NGL, I'm not a huge Linux guy, so I'm assuming both methods of creating a "tar.gz" are valid.

archive-linux-tar.tar.gz
archive-linux-tar-gzip.tar.gz

Erior commented

Tar files are forward reading archives (think old tape backup), try out the ReaderFactory, I think it will work much better for you.

I do see the detection for file.tar. is not really well supported for the ArchiveFactory, this could be improved, not sure what GZipArchive is supposed to do, it's not really an archive format as such.
Perhaps @adamhathcock can give advice on what is expected.

gzip as such may not contain a name for the file you compressed , check the "-N, --name" option, you can do "man gzip" or search on the web for more info regarding that.

Thanks for replying @Erior. The ReaderFactory works for all the different tar-gz files I made. However, I'm not using the ReaderFactory because using the ArchiveFactory gets me 7Zip support. The TarArchive class should handle the file I made. According to FORMATS.md anyways.

gzip as such may not contain a name for the file you compressed

Did you maybe get that backwards? The one-step file made with only tar has the null Key. The two-step file made with tar and then gzip is handled just fine.

Erior commented

Problem, it is not detected as a Tar Archive, if you skip the name and just open the internal entry stream again you would get the tar archive.

For the second part, If you did "gzip -5n" you would get the same "no name" scenario for both streams.

if you skip the name and just open the internal entry stream again you would get the tar archive.

Yeah, but I don't want to do that. But that's just my preference in this specific scenario--I want all the entries to have names. But a null Key is prolly the same design decision I'd make here. Oh well. I'll work around it.

If you did "gzip -5n" you would get the same "no name" scenario for both streams.

Hmm... I guess that's what tar is doing with the -z flag. Prolly the other compression options, too. Seems valid then.

Given all that, I guess this isn't a bug. Closing. Thanks for the discussion.

gzip is a compression around tar which is just a file format. Can't really have random access around a tar.gz

There is a header for gzip that may contain the filename of the tar but it's not required....haven't looked at the file format for gzip for years.

Hope this helps