disk image sometimes modified after hfs2dfxml.py is called
Opened this issue · 14 comments
If you run hfs2dfxml.py against certain HFS disk images in order to generate DFXML it can modify the disk image:
$ sha256sum my-hfs-disk-image.001
c3f1cdbe750fa27eeb1ad18c08a135766455f5e29dbebcd049cca076a6f61ea5
$ python hfs2dfxml.py my-hfs-disk-image.001 hfs2dfxml-output.xml
$ sha256sum my-hfs-disk-image.001
6b2f340841e0cc5e7eb8bb474feb8f60e02ff8d8535f2f2cdea1f9adbe1e7bad
Have you seen this behaviour before? I may be able to supply the offending .001 disk image, if helpful.
I have not seen it before. Can you supply the .001 image so I can try it out to see what's happening?
Are you only seeing this with certain images, or does it happen every time?
Of my 4 test images, only the one is failing (M1126-0001.001 is the same as my-hfs-disk-image.001). It's not my file so I'll have to confirm that I can make it public before sharing it.
Unsure if the humount: No volume is current
warning is relevant:
$ sha256sum workfiles/*
c3f1cdbe750fa27eeb1ad18c08a135766455f5e29dbebcd049cca076a6f61ea5 workfiles/M1126-0001.001
19bbaf48dbebf0fe4287b28ce98432e1899be40e8fe661af5abf523444c97c11 workfiles/M22296-0001.001
359c0917411db757665dd48a8a59185abfb78a173201acc5f5544aef0e165009 workfiles/M22717-0007.001
82034091b3058d01e686135a1fdfe92e37ea9284bdba07c4633c099e249271c6 workfiles/uclalsc_ml_227_026.img
$ python hfs2dfxml.py workfiles/M1126-0001.001 1.xml
humount: No volume is current
$ python hfs2dfxml.py workfiles/M22296-0001.001 2.xml
humount: No volume is current
$ python hfs2dfxml.py workfiles/M22717-0007.001 3.xml
humount: No volume is current
$ python hfs2dfxml.py workfiles/uclalsc_ml_227_026.img 4.xml
humount: No volume is current
$ sha256sum workfiles/*
3c7d5ba875162531d8cfffc53cdc1ce418593be8be809c1088b510e491c95952 workfiles/M1126-0001.001
19bbaf48dbebf0fe4287b28ce98432e1899be40e8fe661af5abf523444c97c11 workfiles/M22296-0001.001
359c0917411db757665dd48a8a59185abfb78a173201acc5f5544aef0e165009 workfiles/M22717-0007.001
82034091b3058d01e686135a1fdfe92e37ea9284bdba07c4633c099e249271c6 workfiles/uclalsc_ml_227_026.img
Note that subsequent calls change the checksum in different ways:
$ sha256sum workfiles/M1126-0001.001
c3f1cdbe750fa27eeb1ad18c08a135766455f5e29dbebcd049cca076a6f61ea5 workfiles/M1126-0001.001
$ python hfs2dfxml.py workfiles/M1126-0001.001 1.xml
humount: No volume is current
$ sha256sum workfiles/M1126-0001.001
2c2c41ec318b0265b228e679c56d1213d981b2274ae03063d070c4bd433c6e77 workfiles/M1126-0001.001
Unsure if it's relevant, but the M1126-0001.001 image contains a .DS_Store file.
Note also that I'm using a dev branch of a fork: https://github.com/Hwesta/hfs2dfxml/tree/patch-1
Thanks for the additional information. The humount: No volume is current
message isn't a problem here. The script is just checking to ensure another image isn't already mounted (or was not cleanly unmounted from a previous attempt).
Do let me know if you're able to send the disk image -- otherwise, we can go through the hfsutils
calls one by one to see which one is causing the problem.
If the M1126-0001.001
disk image is made read-only, do you get an error running hfs2dfxml.py
?
Have you tried running a cmp
between the two files?
cmp -l original_image changed_image
would help give an idea of the extent of the change.
@jrwdunham -- Just circling back to this: were you able to see if you could share the disk image with me so I can test this out on my system?
@dd388 unfortunately the NYPL folks may not be available in the very near future to give me authorization to share this image. I'll let you know when I've heard from them and hopefully I can find some time to get more technical details on this issue (cf. suggestions above) in the meantime.
I do have a copy of the disk image (thank you @jrwdunham!).
Preliminary testing shows just the act of mounting the image using hmount
and humount
is changing it, independent of my script... More soon.
First test -- ran hmount
/humount
on the disk image, and then used unhfs
to export all of the files to a directory. Then, I took a clean copy of the disk image, ran unhfs
to export all of the files to a different directory.
Did comparisons of all exported files -- as far as I can tell, they're all the same. Curious...
(I also did hexdiff
between the altered disk image and the original one, but I couldn't make sense of the results. i.e., if the altered bits actually corresponded to any files.)
strace
fun ahead...
Log output shows the disk image is opened with the flag of O_RDONLY
, but then a few lines down is opened again as O_RDWR
.
However, I see two write
lines, where a small number of bytes of data is written to the file. This does not happen to a control disk image (i.e., one that isn't changed by hmount
). But I will note that the control disk image is also first opened as O_RDONLY
, then as O_RDWR
(though it never gets written to).
I'll have to dig deeper to figure out what this data is, and why it's being written to the file. I'll also try to think about / work on a workaround/fix that sets the file as readonly before the script gets called.
I re-compiled hfsutils
with --enable-debug
and ran hmount
through gdb
, mounting the disk image in question. This part of the log seems relevant:
VOL: "DISKIMG" not cleanly unmounted
VOL: scavenging...
BLOCK: WRITE vol 0x620620 block 2
BLOCK: CACHE vol 0x620620 "DISKIMG" hit/miss ratio = 1.500
VOL: scavenging complete
Then the usual hmount
output.
So that suggests to me that something is being set so that the disk image is now seen as "cleanly unmounted." In fact, the second time I mount the disk image, it doesn't show that error.
If I take the disk image, and mount it subsequent times (after the first mount that edits it), it does not seem to change after that. But the way it changes the first time is different each time (which is what @jrwdunham reported.)
So, this doesn't necessarily suggest a tidy solution (to me, so far) but I'll keep digging.
Make what this as you can: file
reports there is Macintosh HFS data (mounted)
on this image.
According to libmagic
, the mounted part is triggered by a specific pattern: http://www.obscure.org/webmail/program/lib/magic
If you set the file to readonly (in the filesystem) and run hmount
, it does now report that the volume is "locked". There doesn't seem to be an issue with running hls
, though -- all of the contents of the image that I'm expecting are there.