Error `chunk has higher start_offset than end_offset` with end_offset=0
AndrewFasano opened this issue · 6 comments
Describe the bug
Unblob reports an error Chunk has higher start_offset than end_offset
with an end_offset value of 0 for at least 58 DLINK firmware images and fails to extract files.
To Reproduce
Steps to reproduce the behavior:
- Download a sample firmware to trigger the bug with:
wget https://legacyfiles.us.dlink.com/DCS-5009L/REVA/FIRMWARE/DCS-5009L_REVA_FIRMWARE_1.00.B1.zip
- Launch unblob with command
unblob -v DCS-5009L_REVA_FIRMWARE_1.00.B1.zip
- See error:
2024-02-10 23:27.13 [error ] Unknown error happened pid=2767672
Traceback (most recent call last):
File "/unblob/unblob/processing.py", line 246, in process_task
self._process_task(result, task)
File "/unblob/unblob/processing.py", line 310, in _process_task
_FileTask(self._config, task, stat_report.size, result).process()
File "/unblob/unblob/processing.py", line 522, in process
unknown_chunks = calculate_unknown_chunks(outer_chunks, self.size)
File "/unblob/unblob/processing.py", line 688, in calculate_unknown_chunks
unknown_chunk = UnknownChunk(
File "<attrs generated init unblob.models.UnknownChunk>", line 9, in __init__
self.__attrs_post_init__()
File "/unblob/unblob/models.py", line 69, in __attrs_post_init__
raise InvalidInputFormat(
unblob.file_utils.InvalidInputFormat: Chunk has higher start_offset than end_offset: 0xac5f2-0x0
Expected behavior
A standard linux-based filesystem should be extracted. If binwalk
is run on this image it finds a CPIO archive within LZMA compressed data that contains ~700 files.
Environment information:
- OS: Ubuntu 22.04
- Docker
Linux b4935d734f27 6.2.2 #3 SMP PREEMPT_DYNAMIC Wed Mar 8 12:03:22 EST 2023 x86_64 x86_64 x86_64 GNU/Linux
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
The following executables found installed, which are needed by unblob:
7z ✓
debugfs ✓
jefferson ✓
lz4 ✓
lziprecover ✓
lzop ✓
sasquatch ✓
sasquatch-v4be ✓
simg2img ✓
ubireader_extract_files ✓
ubireader_extract_images ✓
unar ✓
zstd ✓
Additional context
I found this bug while doing some large-scale evaluations of filesystems produced by binwalk and unblob using fw2tar.
The root cause of this issue is that a valid chunk is identified by the DMG handler, which is probably a false positive. I doubt a DMG file would be in a router firmware.
I'll look into it and keep you posted.
There's indeed a DMG file within the firmware, called h264plugin.dmg
. The root cause is that two handlers (bzip2
, dmg
) rightfully identify overlapping content in a UDBZ
dmg file.
More information about dmg disks can be found at https://disktype.sourceforge.net/doc/ch03s13.html
I think the bzip2
handler should check if the bzip2 compressed stream is followed by an XML plist, indicative of a DMG file.
Another way of fixing this is changing the contains
implementation of our chunks:
diff --git a/unblob/models.py b/unblob/models.py
index 70217c8..935bdba 100644
--- a/unblob/models.py
+++ b/unblob/models.py
@@ -85,7 +85,7 @@ class Chunk(Blob):
def contains(self, other: "Chunk") -> bool:
return (
- self.start_offset < other.start_offset
+ self.start_offset <= other.start_offset
and self.end_offset >= other.end_offset
)
@AndrewFasano will be fixed by #755
Thanks for the quick fix! I'll give it a try and report back.
The fix seems to work, thanks!