RKrahl/archive-tools

Failure from verify if the archive contains a directory with long path name

Closed this issue · 0 comments

With Python 3.7 and older, an archive fails verification if it contains a directory with a long path name:

>>> import os
>>> from pathlib import Path
>>> import sys
>>> import tempfile
>>> from archive.archive import Archive
>>> 
>>> base = tempfile.mkdtemp(prefix="tarfile-test-")
>>> os.chdir(base)
>>> 
>>> sys.version_info
sys.version_info(major=3, minor=7, micro=10, releaselevel='final', serial=0)
>>> 
>>> dirname = Path("lets_start_with_a_somewhat_long_directory_name_"
...                "because_we_need_a_very_long_overall_path")
>>> os.mkdir(dirname)
>>> 
>>> subdir1 = dirname / "sub-1"
>>> subdir2 = dirname / "sub-directory-2"
>>> os.mkdir(subdir1)
>>> os.mkdir(subdir2)
>>> len(str(subdir1))
93
>>> len(str(subdir2))
103
>>> 
>>> archive_path = Path("sample.tar")
>>> 
>>> Archive().create(archive_path, paths=[dirname])
<archive.archive.Archive object at 0x7f7aa8452150>
>>> 
>>> with Archive().open(archive_path) as archive:
...     archive.verify()
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/abuild/test/archive-tools-0.5.2.dev90+g8f613d1/build/lib/archive/archive.py", line 292, in verify
    self._verify_item(fileinfo)
  File "/home/abuild/test/archive-tools-0.5.2.dev90+g8f613d1/build/lib/archive/archive.py", line 304, in _verify_item
    raise ArchiveIntegrityError("%s: missing" % itemname)
archive.exception.ArchiveIntegrityError: /tmp/tarfile-test-jog7k657/sample.tar:lets_start_with_a_somewhat_long_directory_name_because_we_need_a_very_long_overall_path/sub-directory-2: missing

The cause of the issue becomes apparent if we look at the members of the tarfile, note the spurious trailing forward slash in the name of subdir2:

>>> with Archive().open(archive_path) as archive:
...     for ti in archive._file.getmembers():
...         print(ti.name)
... 
lets_start_with_a_somewhat_long_directory_name_because_we_need_a_very_long_overall_path/.manifest.yaml
lets_start_with_a_somewhat_long_directory_name_because_we_need_a_very_long_overall_path
lets_start_with_a_somewhat_long_directory_name_because_we_need_a_very_long_overall_path/sub-1
lets_start_with_a_somewhat_long_directory_name_because_we_need_a_very_long_overall_path/sub-directory-2/

The problem does not occur with Python 3.8 and newer:

>>> import os
>>> from pathlib import Path
>>> import sys
>>> import tempfile
>>> from archive.archive import Archive
>>> 
>>> base = tempfile.mkdtemp(prefix="tarfile-test-")
>>> os.chdir(base)
>>> 
>>> sys.version_info
sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)
>>> 
>>> dirname = Path("lets_start_with_a_somewhat_long_directory_name_"
...                "because_we_need_a_very_long_overall_path")
>>> os.mkdir(dirname)
>>> 
>>> subdir1 = dirname / "sub-1"
>>> subdir2 = dirname / "sub-directory-2"
>>> os.mkdir(subdir1)
>>> os.mkdir(subdir2)
>>> len(str(subdir1))
93
>>> len(str(subdir2))
103
>>> 
>>> archive_path = Path("sample.tar")
>>> 
>>> Archive().create(archive_path, paths=[dirname])
<archive.archive.Archive object at 0x7f4c4b3ee1f0>
>>> 
>>> with Archive().open(archive_path) as archive:
...     archive.verify()
... 
>>> with Archive().open(archive_path) as archive:
...     for ti in archive._file.getmembers():
...         print(ti.name)
... 
lets_start_with_a_somewhat_long_directory_name_because_we_need_a_very_long_overall_path/.manifest.yaml
lets_start_with_a_somewhat_long_directory_name_because_we_need_a_very_long_overall_path
lets_start_with_a_somewhat_long_directory_name_because_we_need_a_very_long_overall_path/sub-1
lets_start_with_a_somewhat_long_directory_name_because_we_need_a_very_long_overall_path/sub-directory-2

Further investigation reveals that it is the Python version that created the archive that matters: an archive created with Python 3.8 can be verified with Python 3.7 without error, but if the archive has been created with Python 3.7, verification also fails with Python 3.8. Apparently, the relevant change was bpo-36268: the switch to the POSIX.1-2001 pax standard as the default format used for writing tars with mod:tarfile.