grml/grml-debootstrap

Unaligned partitions with `--vm` and device target

Closed this issue · 7 comments

lxp commented

I am running Debian bullseye with grml-debootstrap 0.96 and parted 3.4-1.
When running with --vm and a device file as target, unaligned partitions are created.

This seems to be caused by the non-optimal start and end arguments used in the mkpart command:

parted -s "${TARGET}" 'mkpart primary ext4 2M -1'

I tried to analyse it further by manually running parted and fdisk:

$ sudo parted /dev/vgcrypt/test
GNU Parted 3.4
Using /dev/dm-4
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel msdos
(parted) mkpart primary ext4 2M -1
Warning: The resulting partition is not properly aligned for best performance: 3906s % 4096s != 0s
Ignore/Cancel? i
(parted) print
Model: Linux device-mapper (linear) (dm)
Disk /dev/dm-4: 21,5GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start   End     Size    Type     File system  Flags
 1      2000kB  21,5GB  21,5GB  primary  ext4         lba

(parted)

$ sudo fdisk /dev/vgcrypt/test                           

Welcome to fdisk (util-linux 2.36.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): p
Disk /dev/vgcrypt/test: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 524288 bytes / 2097152 bytes
Disklabel type: dos
Disk identifier: 0xf08bb2c7

Device             Boot Start      End  Sectors Size Id Type
/dev/vgcrypt/test1       3906 41941087 41937182  20G 83 Linux

Partition 1 does not start on physical sector boundary.

The start 2M is interpreted as 2 Megabyte (2000000 bytes or 3906 sectors) from the disk start.
The end -1 is interpreted as 1 Megabyte (1000000 bytes or 1953 sectors) before the disk end.
I suspect there was some change in the alignment handling or constraint solving between parted 3.2 and parted 3.4, but I could not find the actual cause, why it behaves differently.
As the given parameters are suboptimal anyway, I suggest to change the mkpart command to mkpart primary ext4 2MiB 100%.

2MiB is interpreted as 2 Mebibyte (2097152 bytes or 4096 sectors) from the disk start.
100% is interpreted as the disk end.

These parameters also lead to correct alignment on my system:

$ sudo parted /dev/vgcrypt/test
GNU Parted 3.4
Using /dev/dm-4
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel msdos
Warning: The existing disk label on /dev/dm-4 will be destroyed and all data on this disk will be lost. Do you want to continue?
Yes/No? yes
(parted) mkpart primary ext4 2MiB 100%
(parted) print
Model: Linux device-mapper (linear) (dm)
Disk /dev/dm-4: 21,5GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start   End     Size    Type     File system  Flags
 1      2097kB  21,5GB  21,5GB  primary  ext4         lba

(parted) align-check opt 1
1 aligned

$ sudo fdisk /dev/vgcrypt/test                           

Welcome to fdisk (util-linux 2.36.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): p
Disk /dev/vgcrypt/test: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 524288 bytes / 2097152 bytes
Disklabel type: dos
Disk identifier: 0x868517c9

Device             Boot Start      End  Sectors Size Id Type
/dev/vgcrypt/test1       4096 41943039 41938944  20G 83 Linux
jkirk commented

We just tried to reproduce this in a VirtualBox running Grml 2021.07 (with a very small LV) but here the partition is aligned.

root@grml ~ # parted /dev/vg0/test   
GNU Parted 3.4
Using /dev/dm-1
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel msdos                                                    
(parted) mkpart primary ext4 2M -1                                        
(parted) p                                                                
Model: Linux device-mapper (linear) (dm)
Disk /dev/dm-1: 8389kB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start   End     Size    Type     File system  Flags
 1      2097kB  7340kB  5243kB  primary  ext4         lba

(parted) q                                                                
Information: You may need to update /etc/fstab.

root@grml ~ # fdisk -l /dev/vg0/test
Disk /dev/vg0/test: 8 MiB, 8388608 bytes, 16384 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xbb8a784a

Device         Boot Start   End Sectors Size Id Type
/dev/vg0/test1       4096 14335   10240   5M 83 Linux
root@grml ~ # lvdisplay vg0/test
  --- Logical volume ---
  LV Path                /dev/vg0/test
  LV Name                test
  VG Name                vg0
  LV UUID                9kue8b-pQoe-DKhl-amOD-mrwp-m04c-FG9hMa
  LV Write Access        read/write
  LV Creation host, time grml, 2021-12-03 09:48:39 +0000
  LV Status              available
  # open                 1
  LV Size                8.00 MiB
  Current LE             2
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           254:1

The used LV is very small, I will test it with a larger disk too.

Using 100% as end-of-disk is definitely a good idea. I'd also take 2MiB instead of 2M (as this Mebibyte is the one we want), but I'd like to understand why our examples behave differently.

jkirk commented

Reading https://www.gnu.org/software/parted/manual/html_node/unit.html:

Parted will compute sensible ranges for the locations you specify (e.g., a range of +/- 500 MB when you specify the location in “G”, and a range of +/- 500 KB when you specify the location in “M”) and will select the nearest location in this range from the one you wrote that satisfies constraints from both the operation, the filesystem being worked on, the disk label, other partitions and so on.

and:

Note that as of parted-2.4, when you specify start and/or end values using IEC binary units like “MiB”, “GiB”, “TiB”, etc., parted treats those values as exact, and equivalent to the same number specified in bytes (i.e., with the “B” suffix), in that it provides no “helpful” range of sloppiness.

Still not sure why your partition started at 2000kB when using 2M. @lxp Could you please test mkpart with some different partition disk sizes?

But after reading this and this I'll even propose going for 4MiB. tl;dr:

Cheap flash drives will be with us for a long time to come, and, for them, 1MiB alignment is not enough. Use at least 4MiB-aligned partitions.

@mika?

mika commented

Great find and bug report, thx @lxp - thanks also for looking into this, @jkirk - definitely an interesting find. :)

I couldn't reproduce the issue either from a running Grml live system, but usage of mkpart primary ext4 4MiB 100% sounds like a good plan for me and is hopefully safe for all use cases. So let's do this?

zeha commented

Unaligned starts are sometimes "caused" by the underlying storage devices reporting crazy values; unfortunately thats sometimes also hidden deeper down in the stack. I vaguely remember seeing similar problems with fdisk and some USB mass storage devices.

lxp commented

@jkirk I tried to reproduce it on a different system, but could't reproduce it too. However, I noticed a critical difference. In my original test, fdisk reported the following:

I/O size (minimum/optimal): 524288 bytes / 2097152 bytes

In your test and also in my try on a different system, fdisk reports the following:

I/O size (minimum/optimal): 512 bytes / 512 bytes

Once I have access to the original test system again, I will try to further analyse the situation. However, I can already tell that it uses some more complicated RAID5 or RAID6 setup with SAS disk drivers, so that might be the cause.

jkirk commented

@lxp Yes, I noticed that too. But the optimal IO size of 2097152 are exactly 2MiB. This is still in a "a range of +/- 500 KB when you specify the location in “M”". So maybe this is just a problem how parted tries to apply its magic to "select the nearest location in this range [...] that satisfies constraints [...]"... 🤷🏾

This does not seem to be the only problem: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=923561

So yes, let's go for 4MiB, this should do fine for us.

mika commented

So what do you think of #190?