mhx/dwarfs

[Core Dump] Signal 7 (SIGBUS) (code: nonexistent physical address) on making archive of currently running OS (possibly bad use case)

samuelwatsonofficial opened this issue · 2 comments

Was trying to make a quick backup of essential files using dwarfs when I probably should have copied them first so they are not being read by other applications. Before I show the error, it may just be that bad procedure like this does not fit dwarfs' usecase however I will leave this bug report here in case I am wrong and the crash was due to something else.
"
[jan@archlinux ~]$ mkdwarfs -i /home/jan -o jan2.dwarfs

writing: /home/jan/scratux/src/node_modules/fbjs/node_modules/core-js/client/library.js
156,062 dirs, 51,803/11,805 soft/hard links, 971,726/971,726 files, 0 other
original size: 147.9 GiB, hashed: 35.9 GiB (911,860 files, 53.58 MiB/s)
scanned: 129.6 GiB (644,992 files, 69.9 MiB/s), categorizing: 0 B/s
saved by deduplication: 18.32 GiB (314,929 files), saved by segmenting: 20.14 GiB
filesystem: 62.13 GiB in 3,976 blocks (4,082,826 chunks, 337,487/644,991 fragments, 644,992 inodes)
compressed filesystem: 3,916 blocks/37.69 GiB written
████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ ▏ 81% 🌑
/home/jan/scratux/src/node_modules/fbjs/node_modules/core-js/client/library.js ███████████████████████████████▏ ▏46.39 MiB/s
[compressing] compressed 54.3 GiB to 30.78 GiB (ratio 56.70%) 16.64 MiB/s
*** Aborted at 1713366803 (Unix time, try 'date -d @1713366803') ***
*** Signal 7 (SIGBUS) (0x7de42726c000) received by PID 15676 (pthread TID 0x7de3f22f96c0) (linux TID 18615) (code: nonexistent physical address), stack trace: ***
@ 00000000002687fb (unknown)
@ 000000000003c76f (unknown)
@ 00000000001f4717 (unknown)
@ 000000000012de39 (unknown)
@ 00000000002ce9fb (unknown)
@ 00000000000e1942 execute_native_thread_routine
/usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/thread.cc:104
@ 000000000008b559 (unknown)
@ 0000000000108a3b (unknown)
Bus error (core dumped)
"
After a quick google, someone said they got the error "(linux TID 18615) (code: nonexistent physical address), stack trace: ***" in an unrelated application as they were running two scripts that accessed the same file at the same time (https://stackoverflow.com/questions/58118497/a-possible-cause-of-bus-error-nonexistent-physical-address).
This was me trying to make a dwarfs archive of my whole user drive so it's far from impossible this is from the same issue and does not need to be addressed. This issue could conceivably be from an issue from lack of available RAM or something else.
If there is any further debugging that I can help with I am more than happy to help.
Thank you!

mhx commented

Oh, that is unfortunate. It must have taken a while to get there.

The primary source of bus errors in DwarFS are memory mapped files that disappear or change shape while being mapped. A long term goal is to place the memory-mapping code behind an abstraction layer and have an alternative implementation using regular file I/O. The latter would likely be slower, but would be safe from SIGBUS.

When running mkdwarfs, all input files are memory mapped sooner or later, sometimes multiple times. If these files change shape while being accessed, this can lead to SIGBUS.

In the past, it has been more likely that DwarFS images residing on faulty USB media were the cause of bus errors and failed to extract or caused the FUSE driver to crash.

I don't think this is due to RAM shortage. As you've already pointed out, the most likely cause is files being mutated while also being accessed by mkdwarfs.

I'm inclined to close this as wontfix — but please let me know if you can reproduce this when the input to mkdwarfs is not being mutated.

I'm happy to close this off as I stated earlier copying all the files first before making an archive is reasonable if it is being changed, thanks for the quick response.