onekey-sec/unblob

Error `chunk has higher start_offset than end_offset` with end_offset=0

AndrewFasano opened this issue · 6 comments

Describe the bug
Unblob reports an error Chunk has higher start_offset than end_offset with an end_offset value of 0 for at least 58 DLINK firmware images and fails to extract files.

To Reproduce
Steps to reproduce the behavior:

  1. Download a sample firmware to trigger the bug with: wget https://legacyfiles.us.dlink.com/DCS-5009L/REVA/FIRMWARE/DCS-5009L_REVA_FIRMWARE_1.00.B1.zip
  2. Launch unblob with command unblob -v DCS-5009L_REVA_FIRMWARE_1.00.B1.zip
  3. See error:
2024-02-10 23:27.13 [error    ] Unknown error happened         pid=2767672
Traceback (most recent call last):
  File "/unblob/unblob/processing.py", line 246, in process_task
    self._process_task(result, task)
  File "/unblob/unblob/processing.py", line 310, in _process_task
    _FileTask(self._config, task, stat_report.size, result).process()
  File "/unblob/unblob/processing.py", line 522, in process
    unknown_chunks = calculate_unknown_chunks(outer_chunks, self.size)
  File "/unblob/unblob/processing.py", line 688, in calculate_unknown_chunks
    unknown_chunk = UnknownChunk(
  File "<attrs generated init unblob.models.UnknownChunk>", line 9, in __init__
    self.__attrs_post_init__()
  File "/unblob/unblob/models.py", line 69, in __attrs_post_init__
    raise InvalidInputFormat(
unblob.file_utils.InvalidInputFormat: Chunk has higher start_offset than end_offset: 0xac5f2-0x0

Expected behavior
A standard linux-based filesystem should be extracted. If binwalk is run on this image it finds a CPIO archive within LZMA compressed data that contains ~700 files.

Environment information:

  • OS: Ubuntu 22.04
  • Docker
Linux b4935d734f27 6.2.2 #3 SMP PREEMPT_DYNAMIC Wed Mar  8 12:03:22 EST 2023 x86_64 x86_64 x86_64 GNU/Linux

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"

The following executables found installed, which are needed by unblob:
    7z                          ✓
    debugfs                     ✓
    jefferson                   ✓
    lz4                         ✓
    lziprecover                 ✓
    lzop                        ✓
    sasquatch                   ✓
    sasquatch-v4be              ✓
    simg2img                    ✓
    ubireader_extract_files     ✓
    ubireader_extract_images    ✓
    unar                        ✓
    zstd                        ✓

Additional context
I found this bug while doing some large-scale evaluations of filesystems produced by binwalk and unblob using fw2tar.

The root cause of this issue is that a valid chunk is identified by the DMG handler, which is probably a false positive. I doubt a DMG file would be in a router firmware.

I'll look into it and keep you posted.

There's indeed a DMG file within the firmware, called h264plugin.dmg. The root cause is that two handlers (bzip2, dmg) rightfully identify overlapping content in a UDBZ dmg file.

More information about dmg disks can be found at https://disktype.sourceforge.net/doc/ch03s13.html

I think the bzip2 handler should check if the bzip2 compressed stream is followed by an XML plist, indicative of a DMG file.

Another way of fixing this is changing the contains implementation of our chunks:

diff --git a/unblob/models.py b/unblob/models.py
index 70217c8..935bdba 100644
--- a/unblob/models.py
+++ b/unblob/models.py
@@ -85,7 +85,7 @@ class Chunk(Blob):
 
     def contains(self, other: "Chunk") -> bool:
         return (
-            self.start_offset < other.start_offset
+            self.start_offset <= other.start_offset
             and self.end_offset >= other.end_offset
         )

@AndrewFasano will be fixed by #755

Thanks for the quick fix! I'll give it a try and report back.

The fix seems to work, thanks!