The root fs can get corrupted on a reset
Closed this issue · 4 comments
The root fs is ext3
, a journaling fs that should be resilient to corruption. But it can still be corrupted on a system reset somehow.
This problem can be addressed in different way, for example:
- Update to ext4
- Investigate closer why the fs gets corrupted even though it shouldn't
- Do a "sync" before reset
It is not at all certain that ext4
solves the problem! On the contrary, a quick test (patched "diskim") shows that some files becomes zero-size after reset, and possibly a bunch of other problems.
It would be interesting to know why the fs gets corrupted. Possibly just some setup missing, or that the journal is on an ram fs. I suspect the same problem causes ext4 to fail (badly).
The easy fix that works is to do a "sync". A reset doesn't come out of the blue, it's initiated by a test case. So it's simple to just "sync" before reset. This is the way I'll be going at first.
Reproduce
Found in ovl/k8s-haproxy, and can be reproduced there:
./k8s-ha.sh test start > $log; ./k8s-ha.sh reset_vm 191
# on vm-191
less /etc/haproxy/haproxy.cfg-foobar
The file has >80 of NULL characters appended and the servers are missing. The reset must be done immediately, otherwise the fs may be synced automatically.
BTW
A reboot
from within a VM works without problems, since in does "sync" automatically.
Honestly, I can't see any way to totally avoid corruption on a sudden reset, you can only minimize the risk. I think the "sync" before reset is the best we can do. So, closing...