onekey-sec/unblob

Files hiding in long filenames

xeor opened this issue · 6 comments

xeor commented

Describe the bug
Example, when using tar, there is a limit for what normal gnu tar can unpack when filenames are getting too long. There are probable other utilities that are able to unpack this just fine and therefor, files can hide from unblob by having a long filename.

To Reproduce
Steps to reproduce the behavior:

  1. Start unblob container: docker run -it --rm --entrypoint=/bin/bash ghcr.io/onekey-sec/unblob:latest
  2. Generate a long filename: filename=$(python -c "print('X' * 300)")
  3. Create a dummy-file: touch a_file
  4. Make a tar-ball, but use --transform so you won't hit limit on filesystem: tar -cf test.tar --transform "s/a_file/${filename}/" a_file
  5. Run unblob on the file: unblob test.tar
  6. See error below
unblob@a8584cb412bd:/data/output$ unblob test.tar
2023-04-24 09:11.32 [info     ] Start processing file          file=test.tar pid=99
2023-04-24 09:11.32 [error    ] Unknown error happened while extracting chunk pid=111
Traceback (most recent call last):
  File "/home/unblob/.local/lib/python3.8/site-packages/unblob/processing.py", line 379, in _extract_chunk
    chunk.extract(inpath, extract_dir)
  File "/home/unblob/.local/lib/python3.8/site-packages/unblob/models.py", line 95, in extract
    self.handler.extract(inpath, outdir)
  File "/home/unblob/.local/lib/python3.8/site-packages/unblob/models.py", line 293, in extract
    self.EXTRACTOR.extract(inpath, outdir)
  File "/home/unblob/.local/lib/python3.8/site-packages/unblob/handlers/archive/tar.py", line 89, in extract
    tf.extractall(outdir.as_posix())
  File "/usr/local/lib/python3.8/tarfile.py", line 2028, in extractall
    self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),
  File "/home/unblob/.local/lib/python3.8/site-packages/unblob/handlers/archive/_safe_tarfile.py", line 29, in extract
    super().extract(member, path, set_attrs, numeric_owner=numeric_owner)
  File "/usr/local/lib/python3.8/tarfile.py", line 2069, in extract
    self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
  File "/usr/local/lib/python3.8/tarfile.py", line 2141, in _extract_member
    self.makefile(tarinfo, targetpath)
  File "/usr/local/lib/python3.8/tarfile.py", line 2182, in makefile
    with bltn_open(targetpath, "wb") as target:
OSError: [Errno 36] File name too long: '/data/output/test.tar_extract/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'

Expected behavior
unblob being unblob, it should find a way to unpack it. Maybe use some tar functionality to replace long filenames with a sha256 of the filename?

Environment information (please complete the following information):

  • OS: linux

It's an interesting edge case. It's not limited to the tar format since the filename limit is the 255 bytes length limit imposed on filename by POSIX.

If we want to handle this, we need to handle these errors within unblob core and decide how. Hash renaming is a possibility, but so is truncation.

xeor commented

If the fix ends up being hashing of the file, the same trick might also be used for filenames with invalid characters in them.
Like this.

2023-04-20 21:01.27 [warning ] Path contains invalid characters, it won't be processed path=.../usr/local/go/src/archive/tar/testdata/gnu-not-utf8.tar_extract/hi����bye pid=103

Just throwing it out there since it is another unreadable/unparsable file-name, and they might have a common solution.

A stress-test would be https://go.dev/src/archive/tar/testdata/

Similar exceptions (OSError: [Errno 36] File name too long) can also be triggered by pathlib.rglob on directories containing files with long names.

So even if the extractor creates those files (somehow), we get the exception later on.

Another good source of stress-test is https://github.com/tytso/e2fsprogs/tree/master/tests, which breaks unblob right now.

vlaci commented

We added handling non-posix name handling back in d7351fb

vlaci commented

Also, the whole path cannot be longer than 4096 characters.

We should just capture the error and skip the file. This is what 7z and GNU tar are doing:

7z x test.tar 

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz (806EC),ASM,AES-NI)

Scanning the drive for archives:
1 file, 10240 bytes (10 KiB)

Extracting archive: test.tar
--
Path = test.tar
Type = tar
Physical Size = 10240
Headers Size = 10240
Code Page = UTF-8

ERROR: Can not open output file : File name too long : ./XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Sub items Errors: 1

Archives with Errors: 1

Sub items Errors: 1
tar xvf test.tar 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
tar: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX: Cannot open: File name too long
tar: Exiting with failure status due to previous errors