FileExistsError during extraction with Netgear firmware
AndrewFasano opened this issue · 4 comments
Describe the bug
During extraction of at least 29 NETGEAR firmware images, unblob may try creating the same output file twice triggering an exception. As a result, some files that should be extracted are not.
To Reproduce
Steps to reproduce the behavior:
- Download a sample firmware to trigger the bug with:
wget https://www.downloads.netgear.com/files/GDC/M4100/M4100-V10.0.2.20.zip
- Launch unblob with command
unblob -v M4100-V10.0.2.20.zip
- See error:
2024-02-10 23:39.13 [error ] Unknown error happened while extracting chunk pid=2295991
Traceback (most recent call last):
File "/unblob/unblob/processing.py", line 607, in _extract_chunk
if result := chunk.extract(inpath, extract_dir):
File "/unblob/unblob/models.py", line 115, in extract
return self.handler.extract(inpath, outdir)
File "/unblob/unblob/models.py", line 452, in extract
return self.EXTRACTOR.extract(inpath, outdir)
File "/unblob/unblob/handlers/archive/cpio.py", line 384, in extract
parser.dump_entries(fs)
File "/unblob/unblob/handlers/archive/cpio.py", line 215, in dump_entries
fs.carve(entry.path, self.file, entry.start_offset, entry.size, mode=entry.mode & 0o777)
File "/unblob/unblob/file_utils.py", line 511, in carve
carve(safe_path, file, start_offset, size, mode=mode)
File "/unblob/unblob/file_utils.py", line 294, in carve
with carve_path.open("xb") as f:
File "/usr/lib/python3.10/pathlib.py", line 1119, in open
return self._accessor.open(self, mode, buffering, encoding, errors,
FileExistsError: [Errno 17] File exists: '/tmp/tmp1151iav4/M4100_V10.0.2.20.zip_extract/m4100v10.0.2.20.stk_extract/1201148-2097967.lzma_extract/lzma.uncompressed_extract/lib/libthread_db-1.0.so'
Expected behavior
This error should not be raised, instead additional files should be extracted. I made a simpel change in file_utils.py's carve method (see below) to return early if the target file already exists and with this change an extra 75 files are created in [extract_dir]/m4100v10.0.2.20.stk_extract/1201148-2097967.lzma_extract/lzma.uncompressed_extract
. I doubt this is the right fix, but it shows that this bug prevents some files from being extracted.
Environment information:
- OS: Ubuntu 22.04
- Docker
Linux b4935d734f27 6.2.2 #3 SMP PREEMPT_DYNAMIC Wed Mar 8 12:03:22 EST 2023 x86_64 x86_64 x86_64 GNU/Linux
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
The following executables found installed, which are needed by unblob:
7z ✓
debugfs ✓
jefferson ✓
lz4 ✓
lziprecover ✓
lzop ✓
sasquatch ✓
sasquatch-v4be ✓
simg2img ✓
ubireader_extract_files ✓
ubireader_extract_images ✓
unar ✓
zstd ✓
Additional context
I found this bug while doing some large-scale evaluations of filesystems produced by binwalk and unblob using fw2tar.
My (likely-incorrect) patch that results in additional files being created:
diff --git a/unblob/file_utils.py b/unblob/file_utils.py
index 21e887b..3db4b98 100644
--- a/unblob/file_utils.py
+++ b/unblob/file_utils.py
@@ -291,6 +291,9 @@ def carve(carve_path: Path, file: File, start_offset: int, size: int):
"""Extract part of a file."""
carve_path.parent.mkdir(parents=True, exist_ok=True)
+ if carve_path.exists():
+ print(f"Warning not replacing {carve_path}")
+ return
with carve_path.open("xb") as f:
for data in iterate_file(file, start_offset, size):
f.write(data)
After fixing this, I got another error along the same vein in file_utils which I patched with:
diff --git a/unblob/file_utils.py b/unblob/file_utils.py
index 21e887b..3db4b98 100644
--- a/unblob/file_utils.py
+++ b/unblob/file_utils.py
@@ -579,7 +582,8 @@ class FileSystem:
if safe_link:
dst = safe_link.dst.absolute_path
self._ensure_parent_dir(dst)
- dst.symlink_to(src)
+ if not dst.exists():
+ dst.symlink_to(src)
def create_hardlink(self, src: Path, dst: Path):
"""Create a new hardlink dst to the existing file src."""
Thank you for the very detailed report. I had a quick look and it's probably a bug in the CPIO extractor. I'll keep you posted.
The bug is triggered by a CPIO archive with the same entry stored twice:
7z l sample.cpio | grep thread_db-1.0
2012-01-11 23:17:40 ..... 32220 32220 /lib/libthread_db-1.0.so
2012-01-11 23:17:40 ..... 32220 32220 /lib/libthread_db-1.0.so
@AndrewFasano I opened the discussion on duplicate entries with a draft fix at #756 , your feedback is very welcomed :)
$ cpio -t < 'M4100-V10.0.2.20.zip_extract/m4100v10.0.2.20.stk_extract/1201148-2097967.lzma_extract/lzma.uncompressed'| sort | uniq -c | sort -n
...
1 /var/run
1 /var/run/utmp
2 /lib/libthread_db-1.0.so
2 /sbin/cfe_env
2 /usr/bin/sort
2 /usr/bin/tail
2 /usr/bin/test
2 /usr/bin/tftp
2 /usr/bin/top
2 /usr/bin/traceroute
2 /usr/bin/uptime
2 /usr/bin/wc
2 /usr/bin/which
2 /usr/bin/xargs
2 /usr/bin/yes
2 /usr/sbin/chroot
It looks like only a couple of binaries got patched during the build process (and maybe some previously non-existing added), so I think, the proper solution would be to overwrite duplicate entries on extracting cpio
archives.
I would also limit the solution to the cpio
extractor, and would not make general behavior change in FileSystem
.