Some directories are not reported
kukovecz opened this issue · 6 comments
When extracting chunks, there is a logic for handling the whole chunks differently, here. This results that in some cases some directories are not reported.
Reproduce this with this test file: test.zip. This is actually from the integration test suit, but I had to zip it for github to allow me attach it.
If I run this file with unblob and check the report, I get the following item:
A part of the generated report json
{
"task": {
"path": "/tmp/fruits.lvl1.lzh",
"depth": 0,
"chunk_id": "",
"__typename__": "Task"
},
"reports": [
{
"path": "/tmp/fruits.lvl1.lzh",
"size": 146,
"is_dir": false,
"is_file": true,
"is_link": false,
"link_target": null,
"__typename__": "StatReport"
},
{
"magic": " LHarc 1.x/ARX archive data [lh0], 0x0 OS, with \"apple.txt\"\\012- data",
"mime_type": "application/x-lzh-compressed",
"__typename__": "FileMagicReport"
},
{
"md5": "cf71709694cd2f3e98fcf87524194beb",
"sha1": "701248bfd7dd7a7360ce237754a82425d1d13346",
"sha256": "e016f42094b088058e7fa5d9c3f98bafaeac87899205192d95b8001f72058a0f",
"__typename__": "HashReport"
},
{
"chunk_id": "47941:3",
"handler_name": "lzh",
"start_offset": 96,
"end_offset": 146,
"size": 50,
"is_encrypted": false,
"extraction_reports": [],
"__typename__": "ChunkReport"
},
{
"chunk_id": "47941:2",
"handler_name": "lzh",
"start_offset": 47,
"end_offset": 96,
"size": 49,
"is_encrypted": false,
"extraction_reports": [],
"__typename__": "ChunkReport"
},
{
"chunk_id": "47941:1",
"handler_name": "lzh",
"start_offset": 0,
"end_offset": 47,
"size": 47,
"is_encrypted": false,
"extraction_reports": [],
"__typename__": "ChunkReport"
}
],
"subtasks": [
{
"path": "/tmp/unblob/fruits.lvl1.lzh_extract/96-146.lzh_extract",
"depth": 1,
"chunk_id": "47941:3",
"__typename__": "Task"
},
{
"path": "/tmp/unblob/fruits.lvl1.lzh_extract/47-96.lzh_extract",
"depth": 1,
"chunk_id": "47941:2",
"__typename__": "Task"
},
{
"path": "/tmp/unblob/fruits.lvl1.lzh_extract/0-47.lzh_extract",
"depth": 1,
"chunk_id": "47941:1",
"__typename__": "Task"
}
],
"__typename__": "TaskResult"
}
This means, when unblob handles /tmp/fruits.lvl1.lzh
, it will create 3 subtasks:
/tmp/unblob/fruits.lvl1.lzh_extract/96-146.lzh_extract
/tmp/unblob/fruits.lvl1.lzh_extract/47-96.lzh_extract
/tmp/unblob/fruits.lvl1.lzh_extract/0-47.lzh_extract
And will continue to run for those (sub)tasks. However a task for the /tmp/unblob/fruits.lvl1.lzh_extract
directory is never created, so that directory is just there in the file system without actually being in the generated report.
The directory not being reported/processed as a Task
is an auxiliary directory, that is used only to carve chunks to, we did not assign any report to it, yet, because it was not necessary so far.
If it is really needed a new report type on chunks (CarveReport
?) could resolve this.
Related: #326.
I am not sure we need to do anything with it, though.
Option could be to move the carved files out of the extraction tree structure and store them separately. Also in most cases we are deleting the carves, also carves are easily reproducable.
This way we can use the followning extraction tree structure:
- /tmp/unblob/fruits.lvl1.lzh_96-146_extract/
- /tmp/unblob/fruits.lvl1.lzh_47-96_extract/
- /tmp/unblob/fruits.lvl1.lzh_0-47_extract/
This issue is causing problems with people wanting to do nice things with the unblob API from Python. See #878
This was blocking my ability to map between extraction directories and the blobs they were derived from with the API so I took a stab at it in #891. I didn't figure out how to add a new task/subtask for carving, instead I just added a new report type that logs the source and destination of each carve.
With the example fruits.lvl1
file I the following new outputs are produced in the log which allows a consumer of the log to map between the fruits.lvl1.lzh
file and the 3 carved files: fruits.lvl1.lzh_extract/96-146.lzh
, fruits.lvl1.lzh_extract/47-96.lzh
, and fruits.lvl1.lzh_extract/0-47.lzh
.
{
"carved_from": "/tmp/unblob/fruits.lvl1.lzh",
"carved_to": "/tmp/unblob/fruits.lvl1.lzh_extract/96-146.lzh",
"start_offset": 96,
"end_offset": 146,
"handler_name": "lzh",
"__typename__": "CarveReport"
},
{
"carved_from": "/tmp/unblob/fruits.lvl1.lzh",
"carved_to": "/tmp/unblob/fruits.lvl1.lzh_extract/47-96.lzh",
"start_offset": 47,
"end_offset": 96,
"handler_name": "lzh",
"__typename__": "CarveReport"
},
{
"carved_from": "/tmp/unblob/fruits.lvl1.lzh",
"carved_to": "/tmp/unblob/fruits.lvl1.lzh_extract/0-47.lzh",
"start_offset": 0,
"end_offset": 47,
"handler_name": "lzh",
"__typename__": "CarveReport"
},