Files hiding in long filenames
xeor opened this issue · 6 comments
Describe the bug
Example, when using tar, there is a limit for what normal gnu tar can unpack when filenames are getting too long. There are probable other utilities that are able to unpack this just fine and therefor, files can hide from unblob by having a long filename.
To Reproduce
Steps to reproduce the behavior:
- Start unblob container:
docker run -it --rm --entrypoint=/bin/bash ghcr.io/onekey-sec/unblob:latest
- Generate a long filename:
filename=$(python -c "print('X' * 300)")
- Create a dummy-file:
touch a_file
- Make a tar-ball, but use --transform so you won't hit limit on filesystem:
tar -cf test.tar --transform "s/a_file/${filename}/" a_file
- Run unblob on the file:
unblob test.tar
- See error below
unblob@a8584cb412bd:/data/output$ unblob test.tar
2023-04-24 09:11.32 [info ] Start processing file file=test.tar pid=99
2023-04-24 09:11.32 [error ] Unknown error happened while extracting chunk pid=111
Traceback (most recent call last):
File "/home/unblob/.local/lib/python3.8/site-packages/unblob/processing.py", line 379, in _extract_chunk
chunk.extract(inpath, extract_dir)
File "/home/unblob/.local/lib/python3.8/site-packages/unblob/models.py", line 95, in extract
self.handler.extract(inpath, outdir)
File "/home/unblob/.local/lib/python3.8/site-packages/unblob/models.py", line 293, in extract
self.EXTRACTOR.extract(inpath, outdir)
File "/home/unblob/.local/lib/python3.8/site-packages/unblob/handlers/archive/tar.py", line 89, in extract
tf.extractall(outdir.as_posix())
File "/usr/local/lib/python3.8/tarfile.py", line 2028, in extractall
self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),
File "/home/unblob/.local/lib/python3.8/site-packages/unblob/handlers/archive/_safe_tarfile.py", line 29, in extract
super().extract(member, path, set_attrs, numeric_owner=numeric_owner)
File "/usr/local/lib/python3.8/tarfile.py", line 2069, in extract
self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
File "/usr/local/lib/python3.8/tarfile.py", line 2141, in _extract_member
self.makefile(tarinfo, targetpath)
File "/usr/local/lib/python3.8/tarfile.py", line 2182, in makefile
with bltn_open(targetpath, "wb") as target:
OSError: [Errno 36] File name too long: '/data/output/test.tar_extract/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
Expected behavior
unblob being unblob, it should find a way to unpack it. Maybe use some tar functionality to replace long filenames with a sha256 of the filename?
Environment information (please complete the following information):
- OS: linux
It's an interesting edge case. It's not limited to the tar
format since the filename limit is the 255 bytes length limit imposed on filename by POSIX.
If we want to handle this, we need to handle these errors within unblob core and decide how. Hash renaming is a possibility, but so is truncation.
If the fix ends up being hashing of the file, the same trick might also be used for filenames with invalid characters in them.
Like this.
2023-04-20 21:01.27 [warning ] Path contains invalid characters, it won't be processed path=.../usr/local/go/src/archive/tar/testdata/gnu-not-utf8.tar_extract/hi����bye pid=103
Just throwing it out there since it is another unreadable/unparsable file-name, and they might have a common solution.
A stress-test would be https://go.dev/src/archive/tar/testdata/
Similar exceptions (OSError: [Errno 36] File name too long
) can also be triggered by pathlib.rglob
on directories containing files with long names.
So even if the extractor creates those files (somehow), we get the exception later on.
Another good source of stress-test is https://github.com/tytso/e2fsprogs/tree/master/tests, which breaks unblob right now.
Also, the whole path cannot be longer than 4096 characters.
We should just capture the error and skip the file. This is what 7z and GNU tar are doing:
7z x test.tar
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz (806EC),ASM,AES-NI)
Scanning the drive for archives:
1 file, 10240 bytes (10 KiB)
Extracting archive: test.tar
--
Path = test.tar
Type = tar
Physical Size = 10240
Headers Size = 10240
Code Page = UTF-8
ERROR: Can not open output file : File name too long : ./XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Sub items Errors: 1
Archives with Errors: 1
Sub items Errors: 1
tar xvf test.tar
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
tar: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX: Cannot open: File name too long
tar: Exiting with failure status due to previous errors