Vanilla-OS/Albius

Installer trying to access an unprovided partition

Closed this issue · 8 comments

20240117_203739
I asked the installer to install with the partitions 6/7/8/9, yet it's trying to access p5 which was on lvm2 (photo says it's fat32).
I followed this blog post when partitioning the disk.

I may have run into the a related issue. My intention was to dual boot with Windows. sda1 to sda6 already existed (if I recall correctly sda1 to sda4 were in the low sectors, and sda 5 and sda6 were at the end of the drive (this is not a SSD, it is a spinning hard drive)). I first shrank the Windows partition from Windows. Then I used ventoy to boot into VanillaOS-2-testing.20240130.iso. During the install I chose to do the manual partition and opened GParted to make the following:

  • sda7 - vosboot ext4 1024mb
  • sda8 - vosefi fat32 512mb
  • sda9 - vosroot unformatted 20992mb
  • sda10 - vosvar 146gb

Then I selected each of those in the installer (did not enable Swap) and started the install. On the 14th step of the install it had a panic failure because there was no space left on /var/tmp during the "Copying blob". A closer look at the blog and at the start of step 14 it opened btrfs-progs to perform a full device TRIM on /dev/sda6 which was 200mb.

sda6 was not one of the partitions I chose. I think that sda6 was the ASUS laptop recovery partition.

I reopened GParted and saw:

  • sda1 through sda3 look the same as before
  • sda7 has no Mount Point or Label - 49.5mb used (should have been vos-boot)
  • sda8 looks untouched (should be vos-efi)
  • sda9 shows a Mount Point of vos-root and no Label which seems wrong, it does show as lvm2 pv and it shows all 20.5gb in use
  • sda10 was supposed to be vosvar btrfs, but it now shows ext4 with a mount point of /mnt/a/boot and a Label of vos-boot and 3.36gb in use
  • unallocated - there is supposed to be an unallocated on this drive, but it should be between sda6 and sda7
  • sda4 shows fat32 with a Mount Point of /mnt/a/boot/efi with 1.93mb used. This partition should not have been touched. It is now located after unallocated, which implies the sector numbers changed. I think this was a Swap partition for a previously installed Debian. I deleted the Debian partition but left the Swap partition because I wasn't positive it was safe to delete.
  • sda5 looks untouched with the exception that it also seems to have new sector numbers placing it after unallocated
  • sda6 is now btrfs (it wasn't before, but I can't recall what it was) and has 3 Mount Points listed /mnt/a/var, /mnt/a/var/storage/graph/overlay, /var/tmp. 144kb in use.

Just trying to think of possible contributing factors that could help reproduce:

In reference to the partition tables: a few days ago this laptop was still factory original (but from 2021). A couple days ago I installed Ubuntu 22.04 on it, and I let the Ubuntu installer shrink the Windows partition (I think) and to be dual-boot. After I did that I immediately regretted not using Debian, so I installed Debian over Ubuntu (reformatted that partition). Then I saw the post about the Vanilla OS 2 beta and decided to start over. But I wanted to shrink the Windows partition again, but thought the right place to do it was using the Windows tool in Computer Management to shrink the Windows partition and delete the Debian one. Windows said there was an issue with the Windows partition and to run chkdsk, which I assumed was probably just Windows not knowing about when I shrunk it from Ubuntu installer, but after a reboot it was fine and I shrank the Windows partition and deleted the Debian partition and then began to install Vanilla OS.

During the Vanilla OS install the installer crashed (no error, just disappeared) but it was before I got to the partition stuff. Instead of rebooting I reopened the installer (clicked on the top right of the screen where the power button is and then clicked the Settings Gear which opened the settings which has a tab for Apps where I saw the Vanilla OS Installer and I clicked the Open button for that and went through the install a second time.
Screenshot from 2024-02-02 00-02-54
Screenshot from 2024-02-02 00-02-25
Screenshot from 2024-02-01 23-51-12

Couple notes:

I retried twice to install. I have sda1 to sda10, and the selection menu in the installer puts them in alphabetical order (1 10 2 3 4 5 6 7 8 9) where 10 is in between 1 and 2, which made me think it could possibly be related to the installer assigning the partition number in alphabetical instead of numerical order. But Bob only has 9 partitions.

Bob and I do have one thing in common, our partitions are not in alphabetic or numeric order in GParted. His goes 1 2 3 4 6 7 8 9 5 and mine goes 1 2 3 7 8 9 10 4 5 6. And he said his lvm2 partition is showing as a fat32 and my btrfs shows as ext4.

I made a mistake in where the unallocated space was showing in GParted. It was correct. Ignore everything I said about sectors and partition order. 1 2 3 7 8 9 10 unallocated 4 5 6 is correct. That makes me speculate that Bob's is also correct that the 5 is at the end and not the middle.

Which makes me think it could be some sort of array index mismatch where one side assumes partition index order is the same as either alphabetic or numeric order and the other side expects the partition index order to be where the partition is stored on the platter (meaning sector order)?

This could absolutely be the problem. Thanks for the observation.

This is serious and I can reproduce this.
This needs to be fixed ASAP.

Could you test if this is fixed in the latest iso?
https://github.com/Vanilla-OS/live-iso/actions/runs/7763576067

! Keep in mind that it's still dangerous. So don't do it if you have important data on the disk. !

@BobbedBob @ericwikman

This did seem to fix it!

I'm glad you were able to reproduce and fix it. I too am able to reproduce the (hopefully former) issue reliably now. I am curious how you fixed it, and if I can see the commit for my own education.

I was able to determine that it did destroy the MyASUS WinRE recovery partition (no big deal, I knew this was a beta). I decided to just let VOS wipe my drive completely clean and do the full drive install, which worked fine.

Once the drive was wiped it gave me the opportunity to try many different ways to reproduce the issue. The method to reproduce I used was to delete all partitions, then create a 10gb partition at the start of the drive and a 10gb partition at the end of the drive. Then I created the 4 partitions for VOS directly after the first partition, so the ordered ends up being 1 3 4 5 6 2. It destroys partition 2 and the install fails.

If there is a partition at the end of the drive before you create the VOS partitions it won't work. I suspect if I was to create 3 partitions on a blank drive and then delete the 2nd partition and then create the 4 VOS partitions it would end up something like 1 3 2 4 5 6 (if there isn't enough space between 3 and 2 to fit 4/5/6) that it would fail as well, but did not try it. Or if I was to manually set the sectors for each partition in a funky order.

It does not seem to matter how many partitions are before or after the VOS ones as long as they are all in sector order (1 2 3 4 5 6 7 8, where 3-6 are VOS), that installs fine.

I probably did like a dozen install attempts today, so I've gotten pretty familiar with the process and have a few items of feedback (if this is an acceptable place for now).

  1. I found that I had to reboot after using GParted to create the 4 VOS partitions. If I did not, then the install would fail, but not always in the same manner. This is a screenshot of one failure, in step 2 it panic's because there is no /dev/sda6 (but GParted says there is (I did commit the changes to the partition table before continuing with the install)):

Screenshot from 2024-02-03 00-48-09

  1. this may be more of a GParted issue, but sometimes I could deactivate the vos-root partition to allow me to be able to delete it, and sometimes I could not. When deactivating did not work, then I could run lvremove from the command line to release the lock on the partition so that I could delete it from GParted

  2. also probably a GParted issue, but if I deleted the 4 partitions, reboot, and then add them back in the exact same order and same size, the deleted partitions returned to their former state. The vos-root I create as unformatted, but after I commit the changes it magically becomes lvm2. Even if I first told GParted to format the partition to be ext2, then commit, reboot, delete the partition, commit, reboot, create the partition. If it has the same start and end sector, it goes back to its old state.

  3. The laptop is not connected via Ethernet, I'm only using wifi. If I connect to wifi from the installer, then the installer crashes eventually, typically before I click on the final button to begin the install (although once I was able to get it actually installing before it crashed). If I connect to wifi from GNOME before I begin the install process then it does not crash. This is also true on the Vanilla OS First Setup program after the install when it opens the first boot walk-through (true both ways, if I setup wifi from the First Setup program then it eventually crashes, but if I do it from Gnome it does not). No error message displays, it just disappears.

  4. In the installer on the Configure Disk page it says that you don't need a Swap partition for hibernation, but your Nov 22 Devlog blog post (with the partition instructions) says you do. I know the documentation is not up to date yet, but that blog post is linked from the beta announcement blog post.

  5. It would be nice on the language and the keyboard selection pages if it told you at the top which choice has been selected without having to scroll through everything to see it is English US.

  6. It would be nice if the settings you make in the installer persist to the First Setup wizard. That is for language, keyboard, timezone, and wifi password.

None of the above are important to me, feel free to ignore. I'm happy you got the issue resolved.

Thanks for testing it.

The issue was in an assumption that an array would have partitions sda1, sda2 and so on in order.
Since this is not always the case, it would select for example the second partiton of your disk because you selected sda2, no matter if that was actually correct.
This is the commit that fixed it:
75ca3dc

And no, an issue is always only suitable for one topic, so if you care about the other issues, please open new issues for them. (Except for 2 and 3, since they are gparted issues)