bacongravy/macinbox

hdiutil failed with non-zero exit code: 16

Brantone opened this issue · 13 comments

Still debugging to try to figure out what's going on, but figured would post here in case anything obvious.
The command:

sudo macinbox --box-format vmware_fusion --name macos-10.14.3-1 --disk 60 --no-hidpi --debug

The part where it errors out:

Feb  9 18:33:37  installer[57575] <Notice>: Calling completion handler for 0x7fcff1800790

  + Configuring the primary user account...
  + Installing the default insecure vagrant ssh key...
  + Enabling password-less sudo...
  + Enabling sshd...
File Doesn't Exist, Will Create: /var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/create_image_from_installer.LTbQT8AA/scratch_mountpoint/private/var/db/com.apple.xpc.launchd/disabled.plist
  + Saving the image...
    - Detaching the image...
"disk17" unmounted.
hdiutil: couldn't eject "disk17" - Resource busy
    - /usr/bin/hdiutil failed with non-zero exit code: 16. Sleeping and retrying...
    - Detaching the image...
"disk17" unmounted.
hdiutil: couldn't eject "disk17" - Resource busy
    - /usr/bin/hdiutil failed with non-zero exit code: 16. Sleeping and retrying...
    - Detaching the image...
"disk17" unmounted.
hdiutil: couldn't eject "disk17" - Resource busy
    - /usr/bin/hdiutil failed with non-zero exit code: 16. Sleeping and retrying...
    - Detaching the image...
"disk17" unmounted.
hdiutil: couldn't eject "disk17" - Resource busy
    - /usr/bin/hdiutil failed with non-zero exit code: 16. Sleeping and retrying...
    - Detaching the image...
"disk17" unmounted.
hdiutil: couldn't eject "disk17" - Resource busy
    - Error: /usr/bin/hdiutil failed with non-zero exit code: 16

• Cleaning up...
• WARNING: Temporary files were not removed. Run this command to remove them:
• sudo rm -rf /var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/create_image_from_installer.LTbQT8AA \
/var/folders/nx/q112c5wx7vn_0bmv9vdhgp7hqpw449/T/macinbox_user_temp.E1zHpv3N \
/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/macinbox_root_temp.eIrucfbr

Not sure if hdiutil actually outputs anything, but would be nice to have bit more verbosity ... like what is the exact command that is being run?

Will post things as I find them.

The command being run in this case is /usr/bin/hdiutil detach /dev/disk17:

Task.run %W[ /usr/bin/hdiutil detach #{@disk_device} ] + @quiet_flag

You could try changing the max_attempts in that function, or making the sleep longer, as a workaround. The reason why the function does multiple attempts is that it seems to take a while for APFS volumes to "quiesce" after having lots of data written to them. On my machine, 60 seconds is always long enough, but on a different machine, under different load, it could conceivably take longer.

Yeah, that's the around the part where I'm digging into : why is the resource still busy, or what (if anything) is still going on ... had to take a break, but couple hours later and trying manually:

$ sudo /usr/bin/hdiutil detach /dev/disk17
Password:
"disk17" unmounted.
hdiutil: couldn't eject "disk17" - Resource busy

So back to digging in "what the what" is causing the hang, pretty sure this is something on my system, but would still be handy if, in a situation like this, macinbox would spit out the command that failed (rather than having to dig through code to find it).

That's surprising that the disk would still be busy after a few hours. The lsof tool might reveal if something is holding a file open on the disk. From the logs you posted, you'd be looking for something open under the scratch_mountpoint directory, e.g. :

sudo lsof | grep scratch_mountpoint

Surprisingly :

# rm -rf /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/create_image_from_installer.ii08lvPL/scratch_mountpoint
rm: /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/create_image_from_installer.ii08lvPL/scratch_mountpoint: Resource busy

Hence

# lsof | grep scratch_mountpoint
mds          80               root   20r      DIR               1,41          64          2 /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/create_image_from_installer.vUrgBZxP/scratch_mountpoint
mds          80               root   23r      DIR               1,17          64          2 /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/create_image_from_installer.ii08lvPL/scratch_mountpoint
mds          80               root   30r      DIR               1,17          64          2 /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/create_image_from_installer.ii08lvPL/scratch_mountpoint
mds          80               root   36r      DIR               1,17          64          2 /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/create_image_from_installer.ii08lvPL/scratch_mountpoint
mds          80               root   38r      DIR               1,29          64          2 /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/create_image_from_installer.3778RisT/scratch_mountpoint
mds          80               root   39r      DIR               1,29          64          2 /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/create_image_from_installer.3778RisT/scratch_mountpoint
mds          80               root   42r      DIR               1,29          64          2 /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/create_image_from_installer.3778RisT/scratch_mountpoint
mds          80               root   43r      DIR               1,41          64          2 /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/create_image_from_installer.vUrgBZxP/scratch_mountpoint
mds          80               root   48r      DIR               1,41          64          2 /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/create_image_from_installer.vUrgBZxP/scratch_mountpoint

And this is even with having /private/var/folders in the "Privacy" tab in System Preferences > Spotlight.
No idea why mds is deciding to start searching it.

Can do eject even though detach errors out ... wondering of drawbacks to bypassing detach and going straight to eject.

Thanks, @Brantone. The man page for hdiutil says:

eject is a synonym for detach. In common operation, detach is very similar to diskutil(8)'s eject.

Which eject did you use? I'm willing to test out changing away from hdiutil detach if there's something else that works better for you and equally as well for me.

Currently still debating if eject is best option.
Also wondering detach was preferred in the first place? Ex: why not just use the detach! method all the time since it has detach -quiet -force and doesn't care for exit status?

I worry that doing a detach! immediately after the install would result in a corrupted filesystem. For some reason the device usually stays busy for about 20 seconds after the install completes.

Later on tonight I can submit a PR the experiment I tested and can go from there.
I did get it to successfully create a Vagrant box, but it still failed on clean up:

• Installing box...
  + Copying box to /Users/USER/.vagrant.d/boxes...
    - Installed box: macos-10.14.3-1 (vmware_fusion, 10.14.3)
• Cleaning up...
• WARNING: Temporary files were not removed. Run this command to remove them:
• sudo rm -rf /var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/create_box_from_vmdk.TrSqD5ZE \
/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/create_vmdk_from_image.zXmCd4WQ \
/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/create_image_from_installer.8F6RO5qG \
/var/folders/nx/q112c5wx7vn_0bmv9vdhgp7hqpw449/T/macinbox_user_temp.oJ4rjFRv \
/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/macinbox_root_temp.MU3FAEJd

Doing lsof showed all the files in those subdirs are still open, so something is holding onto them. If no one else has run into it, will just chalk it up to something unique in my dev environment.

My own experiments with eject seem successful, the first diskutil eject after the install fails, but then the second attempt after waiting 15 seconds succeeds. I'm working on getting that merged, hopefully it helps with your situation.

The temporary files are intentionally not removed when you pass the --debug option, but all of the mounted disk images are supposed to be properly unmounted when the script ends. Does the rm -rf command not work for you even after the box is installed correctly?

I've pushed up 4888511 which replaces hditutil detach with diskutil eject. Hopefully that helps!

I've also pushed up 5f78830 which adds a --verbose flag which prints out the commands that are run, but unlike --debug doesn't leave behind the temporary directories.

Fwiw, this was the minor tweak I made just to get the ball rolling again:
master...Brantone:master
Still no idea why those folders are open / in use, thus the rm -rf fails with error as expected. The box still does get created, so not as concerned about it.

$ pwd
/Users/USER/.vagrant.d/boxes/macos-10.14.3-1/10.14.3/vmware_fusion
HOST:vmware_fusion USER$ ls
Vagrantfile		macinbox-Snapshot1.vmsn	macinbox.vmsd		macinbox.vmxf
macinbox-000001.vmdk	macinbox.vmdk		macinbox.vmx		metadata.json