resize2fs corrupts extents on ext4 filesystem (offline shrinking)
Opened this issue · 6 comments
I have come across a situation where resize2fs corrupts an ext4 filesytem and I can reproduce it reliably.
I have captured an image file with "e2image -r" prior to the resize and have put together a script that reproduces the bug on 1.47.0 (I can't seem to upload the large file to github, so I have cloned this repo and committed my test files at https://github.com/viavi-ab/e2fsprogs/tree/master/ab_test).
The ext4 image that I'm resizing is from a working server that, prior to the resize, had no filesystem problems.
This is an offline resize (shrink) of a large (2TB) filesystem containing a single (very large) sparse file scattered across ~8000 extents (the original image was sanitized, all other files removed, and then zerofree'ed for better compression and to hide data that I'm not allowed to share).
My script goes like this:
e2fsck -f -p work.img
resize2fs work.img 1500000000k
e2fsck -f -p work.img
The filesystem passes the first e2fsck, resize2fs reports no problems, yet the second e2fsck complains (amongst other things) about:
Inode 12, end of extent exceeds allowed value
(logical block 2507128, physical block 640097, len 1)
Inode 12 has an invalid extent
(logical block 2507128, invalid physical block 389065080, len 1)
Inode 12, end of extent exceeds allowed value
(logical block 2515444, physical block 648221, len 1)
Inode 12 has an invalid extent
(logical block 2515444, invalid physical block 389073396, len 1)
Bug does not depend on the presence of other files or the amount of free space in my image, but seems to be related to the particular dispersion of extents of that particular file.
I think what is happening is that as resize2fs is relocating data blocks, it somehow trips on some of the extents resulting in the situation shown above (note how the two logical blocks are duplicated; also, the large physical block numbers were once within the limits of the filesystem, but after resize they have become invalid).
I may be wrong, but looking at the extents allocated to the file, it seems like it has two particular interior extents (level 1) that, after resize, end up with no leaf extents (level 2).
Bug was initially found on e2fsprogs 1.45.6 (stock ubuntu 22 package) but can be reproduced on fresh binaries compiled from the latest e2fsprogs source (1.47.0).
Rearranging the extents of the file inside the image seems to work around the problem; for example, any of these will allow resize2fs to do its work without corrupting the filesystem:
- Mount the image, duplicate the file, overwrite the original ("cp file file.new && mv file.new file"). I believe this causes the extent tree of file to be replaced by a fresh one, probably tidier.
- Run e2fsck with "-E optimize_extents" prior to resize2fs.
If the bug is in the way resize2fs handles the extents, however, neither of these completely rules out the possibility of corrupting the filesystem anyway.
The ext4 image is available at https://github.com/viavi-ab/e2fsprogs/blob/master/ab_test/sanitized_e2.img.zst (~90MB).
The script that reproduces the bug is sanitized_reproducer (requires unzstd to extract the raw image; uncompressed image is a 1.8TB sparse file).
Attached are the output of debugfs stat
and extents
commands before and after the resize:
Hi, I believe that this is connected to #145, I'm not a maintainer and don't have working solution, but maybe you could take a look at it.
Hi @antmat, I think I've found the bug and am putting together a patch.
What’s the preferred way of submitting a patch for review?
Hi @antmat, I think I've found the bug and am putting together a patch. What’s the preferred way of submitting a patch for review?
Hi! Unfortunately I'm not a maintainer of this project - I'm just experiencing related issue. I believe @tytso could help. Maybe the right way is to apply patch on kernel mailing list - I don't know ¯_(ツ)_/¯