TheRemote/MinecraftBedrockServer

World corruption

xamerintime opened this issue · 20 comments

About, every 3 days my world gets corrupted on my server. I can fix this simply by restarting the server and it will let me join and will no longer log corruption errors.

I'm not sure if this is a issue with Minecraft itself or this project, but I figured I'd report it anyways.

[2021-07-26 04:00:37] NO LOG FILE! - setting up server logging...
[2021-07-26 04:00:37] [2021-07-26 04:00:37 INFO] Starting Server
[2021-07-26 04:00:37] [2021-07-26 04:00:37 INFO] Version 1.17.10.04
[2021-07-26 04:00:37] [2021-07-26 04:00:37 INFO] Session ID 9f9660a2-e66d-40d0-94ef-4d0c01cd75c4
[2021-07-26 04:00:37] [2021-07-26 04:00:37 INFO] Level Name: CrackRock
[2021-07-26 04:00:37] [2021-07-26 04:00:37 INFO] Game mode: 0 Survival
[2021-07-26 04:00:37] [2021-07-26 04:00:37 INFO] Difficulty: 2 NORMAL
[2021-07-26 04:00:37] [INFO] opening worlds/CrackRock/db
[2021-07-26 04:00:37] [WARN] LevelDB worlds/CrackRock/db status NOT OK(Corruption: 26 missing files; e.g.: worlds/CrackRock/db/019306.ldb). Trying repair.
[2021-07-26 04:00:39] [INFO] IPv4 supported, port: 0
[2021-07-26 04:00:39] [INFO] IPv6 supported, port: 0
[2021-07-26 04:00:39] [INFO] Package: com.mojang.minecraft.dedicatedserver
[2021-07-26 04:00:39] Version: 1.17.10.04
[2021-07-26 04:00:39] OS: Linux
[2021-07-26 04:00:39] Server start: 2021-07-26 04:00:37 CDT
[2021-07-26 04:00:39] Dmp timestamp: 2021-07-26 04:00:39 CDT
[2021-07-26 04:00:39] Upload Date: 2021-07-26 04:00:39 CDT
[2021-07-26 04:00:39] Session ID: 9f9660a2-e66d-40d0-94ef-4d0c01cd75c4
[2021-07-26 04:00:39] Commit hash: 6c75de4d333599a7a426864d3782d5bc9e6a8ef8
[2021-07-26 04:00:39] Build id: 6472255
[2021-07-26 04:00:39] CrashReporter Key: f660aa7f-cc53-3531-b873-e1261ca7e818
[2021-07-26 04:00:39]
[2021-07-26 04:00:39] Crash
[2021-07-26 04:00:39] [INFO]
Failed to open curl lib from binary, use libcurl.so instead
[2021-07-26 04:00:37] NO LOG FILE! - setting up server logging...
[2021-07-26 04:00:37] [2021-07-26 04:00:37 INFO] Starting Server
[2021-07-26 04:00:37] [2021-07-26 04:00:37 INFO] Version 1.17.10.04
[2021-07-26 04:00:37] [2021-07-26 04:00:37 INFO] Session ID 7e6e0d5b-c308-448f-a03a-d8fe98a99d5c
[2021-07-26 04:00:37] [2021-07-26 04:00:37 INFO] Level Name: CrackRock
[2021-07-26 04:00:37] [2021-07-26 04:00:37 INFO] Game mode: 0 Survival
[2021-07-26 04:00:37] [2021-07-26 04:00:37 INFO] Difficulty: 2 NORMAL
[2021-07-26 04:00:37] [INFO] opening worlds/CrackRock/db
[2021-07-26 04:00:37] [WARN] LevelDB worlds/CrackRock/db status NOT OK(Corruption: 26 missing files; e.g.: worlds/CrackRock/db/019306.ldb). Trying repair.
[2021-07-26 04:00:39] [INFO] IPv4 supported, port: 19132
[2021-07-26 04:00:39] [INFO] IPv6 supported, port: 19133
[2021-07-26 04:00:39] [INFO] IPv4 supported, port: 39149
[2021-07-26 04:00:39] [INFO] IPv6 supported, port: 42091
[2021-07-26 04:00:40] [INFO] Server started.
[2021-07-26 04:06:39] [INFO] Running AutoCompaction...
[2021-07-26 14:36:39] [INFO] Player connected: Fich42, xuid: ********
[2021-07-26 14:36:44] [INFO] Package: com.mojang.minecraft.dedicatedserver
[2021-07-26 14:36:44] Version: 1.17.10.04
[2021-07-26 14:36:44] OS: Linux
[2021-07-26 14:36:44] Server start: 2021-07-26 04:00:37 CDT
[2021-07-26 14:36:44] Dmp timestamp: 2021-07-26 14:36:44 CDT
[2021-07-26 14:36:44] Upload Date: 2021-07-26 14:36:44 CDT
[2021-07-26 14:36:44] Session ID: 7e6e0d5b-c308-448f-a03a-d8fe98a99d5c
[2021-07-26 14:36:44] Commit hash: 6c75de4d333599a7a426864d3782d5bc9e6a8ef8
[2021-07-26 14:36:44] Build id: 6472255
[2021-07-26 14:36:44] CrashReporter Key: f660aa7f-cc53-3531-b873-e1261ca7e818
[2021-07-26 14:36:44]
[2021-07-26 14:36:44] Crash
[2021-07-26 14:36:44] [INFO] at clone (UnknownFile:?)
Failed to open curl lib from binary, use libcurl.so instead

It's a permissions error for sure. The last line is the dead giveaway. Curl is actually inside the Minecraft server (bedrock_server, the giant blob, it's a statically built binary so it's actually bedrock_server itself that is missing +x) so you need to run:

./fixpermissions.sh

from inside your server folder and that will eliminate these errors! It's probably coming out of the .zip file that way. They used to have the +x bit on the server within the file and at some point that got taken out (or they only remember to do it sometimes) and it's unlikely I'll be able to entirely eliminate some updates causing that unless I add passwordless sudo commands to the script that allow it to take ownership of a certain directory without root permissions but it gets real tricky and messes with core security systems to allow the commands.

It's almost always going to be permissions issues with Bedrock (especially on Linux). If you ever see a library or anything being referenced like curl that is also permissions errors because the bedrock_server is static and there are no libraries. That's why the bedrock_server executable is enormous. It's all in there! I've seen 2 actual bugs in the however many years it has been that this script has been around, a few at least!

Hopefully that helps. It's possible you have both the curl permissions bug and some missing world files but I'm guessing ./fixpermissions will likely eliminate that as well as it's far more likely that some files have the wrong permissions in there and the server can't read them and that's why it says they're "missing". If you are literally "missing" world files and ./fixpermissions.sh doesn't eliminate that then I would strongly recommend running fsck and do a complete diagnostic/checkup on that systems storage. This does not just happen (unless there's new bugs, it's always possible, but historically no, there could be very specific bad attributes and stuff that make it crash but never "missing files" like your logs have) so if you're literally missing files and it's not just permissions definitely make some backups on a different drive, check the disk, etc.

Give that a try and let me know if that gets it for you!

It's a permissions error for sure. The last line is the dead giveaway. Curl is actually inside the Minecraft server (bedrock_server, the giant blob, it's a statically built binary so it's actually bedrock_server itself that is missing +x) so you need to run:

./fixpermissions.sh

from inside your server folder and that will eliminate these errors! It's probably coming out of the .zip file that way. They used to have the +x bit on the server within the file and at some point that got taken out (or they only remember to do it sometimes) and it's unlikely I'll be able to entirely eliminate some updates causing that unless I add passwordless sudo commands to the script that allow it to take ownership of a certain directory without root permissions but it gets real tricky and messes with core security systems to allow the commands.

It's almost always going to be permissions issues with Bedrock (especially on Linux). If you ever see a library or anything being referenced like curl that is also permissions errors because the bedrock_server is static and there are no libraries. That's why the bedrock_server executable is enormous. It's all in there! I've seen 2 actual bugs in the however many years it has been that this script has been around, a few at least!

Hopefully that helps. It's possible you have both the curl permissions bug and some missing world files but I'm guessing ./fixpermissions will likely eliminate that as well as it's far more likely that some files have the wrong permissions in there and the server can't read them and that's why it says they're "missing". If you are literally "missing" world files and ./fixpermissions.sh doesn't eliminate that then I would strongly recommend running fsck and do a complete diagnostic/checkup on that systems storage. This does not just happen (unless there's new bugs, it's always possible, but historically no, there could be very specific bad attributes and stuff that make it crash but never "missing files" like your logs have) so if you're literally missing files and it's not just permissions definitely make some backups on a different drive, check the disk, etc.

Give that a try and let me know if that gets it for you!

Thank you so much. I ran this command, world is working fine (but it also was after the first restart prompting me to make this thread) Ill update here if it corrupts again. I do keep off site backups of the world so I'm not too worried about losing it. And the drive is a kingston a400 purchased about 3 months ago, so it shouldn't be a drive health issue. I think u were spot on with ./fixpermissions.sh though.

Thanks for the followup! I'm relieved and confident this will get rid of that "curl" related crash 100% for certain. I'm not as sure about the missing map files but it would make sense that this is also permissions related.

I thought more about this issue after my initial reply and I'm very heavily leaning toward automating this. It will require creating a /etc/sudoers.d/minecraftbe config file that allows fixpermissions to run at startup without having to enter the sudo password. This can be safe if done correctly but I've been avoiding it due to the sensitive nature of sudo. I do believe it can be done safely though if the commands are restricted all the way down to the individual parameters.

I also believe this is something the script should take care of. We can't be having everyone have to run fixpermissions.sh after all/most updates as that is a bad design (and did not used to be necessary when the files used to unzip with the correct +x permissions).

Let's leave this open for a few days and make sure your missing level files issue doesn't return. If it does we will want to take a look at your older backups and see if these files are present in there, then we would want to find out when they disappeared roughly depending on how many backups you have we should be able to figure that out as well. Theoretically you could just replace these files from an older backup if they did disappear somehow but make sure to run ./fixpermissions.sh after doing this as this will recreate the same permissions issues when you move/edit/copy them around.

I think there's an excellent chance you're good to go though. If you wouldn't mind letting us know in a few days if this returned or not I would appreciate it. Thanks again!

Same issue again this morning.

Was fixed by just restarting server (no ./fixpermissions.sh needed)

[2021-07-28 04:00:38] NO LOG FILE! - setting up server logging...
[2021-07-28 04:00:38] [2021-07-28 04:00:38 INFO] Starting Server
[2021-07-28 04:00:38] [2021-07-28 04:00:38 INFO] Version 1.17.10.04
[2021-07-28 04:00:38] [2021-07-28 04:00:38 INFO] Session ID 921870d8-aa2c-4b86-97e0-928848e8a538
[2021-07-28 04:00:38] [2021-07-28 04:00:38 INFO] Level Name: CrackRock
[2021-07-28 04:00:38] [2021-07-28 04:00:38 INFO] Game mode: 0 Survival
[2021-07-28 04:00:38] [2021-07-28 04:00:38 INFO] Difficulty: 2 NORMAL
[2021-07-28 04:00:38] [INFO] opening worlds/CrackRock/db
[2021-07-28 04:00:38] [WARN] LevelDB worlds/CrackRock/db status NOT OK(Corruption: 28 missing files; e.g.: worlds/CrackRock/db/020083.ldb). Trying repair.
[2021-07-28 04:00:40] [INFO] IPv4 supported, port: 0
[2021-07-28 04:00:40] [INFO] IPv6 supported, port: 0
[2021-07-28 04:00:40] [INFO] Package: com.mojang.minecraft.dedicatedserver
[2021-07-28 04:00:40] Version: 1.17.10.04
[2021-07-28 04:00:40] OS: Linux
[2021-07-28 04:00:40] Server start: 2021-07-28 04:00:38 CDT
[2021-07-28 04:00:40] Dmp timestamp: 2021-07-28 04:00:40 CDT
[2021-07-28 04:00:40] Upload Date: 2021-07-28 04:00:40 CDT
[2021-07-28 04:00:40] Session ID: 921870d8-aa2c-4b86-97e0-928848e8a538
[2021-07-28 04:00:40] Commit hash: 6c75de4d333599a7a426864d3782d5bc9e6a8ef8
[2021-07-28 04:00:40] Build id: 6472255
[2021-07-28 04:00:40] CrashReporter Key: f660aa7f-cc53-3531-b873-e1261ca7e818
[2021-07-28 04:00:40]
[2021-07-28 04:00:40] Crash
[2021-07-28 04:00:40] [INFO]
Failed to open curl lib from binary, use libcurl.so instead
[2021-07-28 04:00:38] NO LOG FILE! - setting up server logging...
[2021-07-28 04:00:38] [2021-07-28 04:00:38 INFO] Starting Server
[2021-07-28 04:00:38] [2021-07-28 04:00:38 INFO] Version 1.17.10.04
[2021-07-28 04:00:38] [2021-07-28 04:00:38 INFO] Session ID 9b7d5f0a-fcbc-4f4f-91ef-5332dcda9338
[2021-07-28 04:00:38] [2021-07-28 04:00:38 INFO] Level Name: CrackRock
[2021-07-28 04:00:38] [2021-07-28 04:00:38 INFO] Game mode: 0 Survival
[2021-07-28 04:00:38] [2021-07-28 04:00:38 INFO] Difficulty: 2 NORMAL
[2021-07-28 04:00:38] [INFO] opening worlds/CrackRock/db
[2021-07-28 04:00:38] [WARN] LevelDB worlds/CrackRock/db status NOT OK(Corruption: 28 missing files; e.g.: worlds/CrackRock/db/020083.ldb). Trying repair.
[2021-07-28 04:00:40] [INFO] IPv4 supported, port: 19132
[2021-07-28 04:00:40] [INFO] IPv6 supported, port: 19133
[2021-07-28 04:00:40] [INFO] IPv4 supported, port: 56205
[2021-07-28 04:00:40] [INFO] IPv6 supported, port: 39857
[2021-07-28 04:00:40] [INFO] Server started.
[2021-07-28 04:06:39] [INFO] Running AutoCompaction...
[2021-07-28 12:28:38] [INFO] Player connected: Fich42, xuid: ****
[2021-07-28 12:28:40] [INFO] Package: com.mojang.minecraft.dedicatedserver
[2021-07-28 12:28:40] Version: 1.17.10.04
[2021-07-28 12:28:40] OS: Linux
[2021-07-28 12:28:40] Server start: 2021-07-28 04:00:38 CDT
[2021-07-28 12:28:40] Dmp timestamp: 2021-07-28 12:28:40 CDT
[2021-07-28 12:28:40] Upload Date: 2021-07-28 12:28:40 CDT
[2021-07-28 12:28:40] Session ID: 9b7d5f0a-fcbc-4f4f-91ef-5332dcda9338
[2021-07-28 12:28:40] Commit hash: 6c75de4d333599a7a426864d3782d5bc9e6a8ef8
[2021-07-28 12:28:40] Build id: 6472255
[2021-07-28 12:28:40] CrashReporter Key: f660aa7f-cc53-3531-b873-e1261ca7e818
[2021-07-28 12:28:40]
[2021-07-28 12:28:40] Crash
[2021-07-28 12:28:40] [INFO] at clone (UnknownFile:?)
Failed to open curl lib from binary, use libcurl.so instead
[2021-07-28 12:28:42] 6082167c-ae92-4187-bcc1-7149cf95e536

I would for sure install the latest updates. I did end up automating permissions fixes.

It tells you the first of the files that is missing. worlds/CrackRock/db/020083.ldb. Have you investigated this at all? Does this file exist? Is it in older backups?

Have you ran a fsck on your system? This is extraordinarily rare if it's real but I still don't see why you'd be getting the curl errors. It's definitely failing to repair the world since the files are just gone it thinks. It's a little more common to have these problems on Java but nearly unheard of to have them on Bedrock (because it works differently, it's compiled very low level C++ code with direct hardware / IO access and Java works quite differently and has a Virtual Machine etc.).

When you restart the server are you saying that it doesn't give you the corrupt world error? Or does it give you that every time? If it says it every time then that world is for sure corrupt somehow.

I've literally never seen this happen for real in years. You can check all of the issues submitted here etc, you won't even find that many instances of it happening even on Google and when you do it will be "I turned off power to the server while the Minecraft server was still running and it corrupted" like this: https://www.reddit.com/r/Minecraft/comments/gy54ms/corrupted_world_on_my_personal_be_server_only_a/. That is literally what it takes (or a failing disk).

It's extraordinarily rare and if you have a real corrupted world I'd be extremely suspicious of that computer/server. I'd say investigate your backups though and see what is in there. It can be extraordinarily complex to fix these but it's not impossible and that reddit post outlines he went in to the files and deleted references to chunks etc. and this is possible to do if you're willing to spend a very large amount of time learning the structures of the files and how to build/use a few tools.

I've literally never seen this happen. The closest thing I've ever seen is when I tried moving a Windows server to Linux and then moving it back and I got some corruption errors that were successfully repaired on startup (it gave a status OK after the repair instead of a status NOT OK like yours has). Something unusual is extraordinarily wrong here and I highly recommend doing the fsck / checking the backups for the files it says it are missing and trying to figure out what happened for sure!

I did just update right after making this post.

These files do not exist, they also don't exist in any backup.

I have not ran fsck as I have to unmount the drive and its the system boot drive

When I restart the server, the corruption error goes away and the world works as normal.

The interesting thing is that it goes right back to normal whenever I restart the server. This also happened on a different world, but I accidentally deleted all the logs so I couldn't report it.

That is very strange. It would indicate that the permissions are changing themselves. The server is running for a certain amount of time, then it is writing a change to the world to the disk and all the sudden the permissions of bedrock_server are changing. It means at some point bedrock_server is losing +x while the server is running (curl still has to be referenced in this binary, so if it loses +x permissions while running this would happen). Except the script has no mechanism to even possibly make that happen and nothing else on the system should either.

It would indicate a failing hard drive or a bad partition. As things are getting written permissions seem to be changing/reverting which is a pretty classic symptom for sure. It also makes no sense. The level files can't be missing one moment and then it doesn't care after restarting it unless something like this is going on. I see this behavior all the time on Raspberry Pis where I have a version of this script for Java that runs on them and when impossible things start happening it's a failing SD card (their version of a disk). Sometimes fsck/reimaging it will get it but other times it's low level hardware faults and the card/disk needs to be replaced.

We're literally getting different behavior on startup and then at some point it's losing permissions to bedrock_server which is triggering the curl error (and likely the missing files that are apparently no longer missing when you restart the server). It could also be something like your disk is dropping from your OS momentarily but if it's your boot drive this will usually mess up/crash the system pretty badly.

With any luck it's just a bad partition table that will be fixed by a fsck. This would be my initial guess because there seems to be a method to the madness of how it's behaving if it makes sense. Something specific is reverting/changing, and it probably even has some specific triggers like a certain portion of the map files being written to. Real literal failing disks are usually a bit more messy and inconsistent.

This is impossible behavior though. The map files can't just be gone one moment and there the next. The server can't have permissions to execute curl and then not have them all of the sudden then get them back if you restart the server (which triggers a bunch of writes upon shutdown to save changes). It doesn't make sense which points to hardware/disk faults for sure so I would definitely start here with this one!

Okay, when I get a moment I will preform fsck

Sounds good. If that isn't working we'll have you check some things like running:

ls -al

to check permissions when this happens (especially of bedrock_server, which is where all this curl stuff is and should be remaining with +x permissions). Let's wait and see on the fsck though because I think that may just be the end of it.

These are very serious issues and I'd never hear the end of it here or on jamesachambers.com if this was something that happened very often but I've never seen anything quite like this one (except for the Java version on Raspberry Pis because of the much higher likelihood of a budget SD card to fail vs. a drive in a desktop/server, as well the higher likelihood of a Pi having it's power yanked during a write to disk which fsck can repair 95%+ of the time for those) so let's see what turns up from this first!

am struggling to get fsck running properly. But I do have this https://pastebin.com/raw/mjDwPw4P

This image is probably toast if fsck can't repair it. If it did repair it you would end up having a mix of old and new server files because the files are scattered all over. I was going to tell you to just restore a backup to get around this but I was not expecting the fsck to fail as it definitely just seemed like a standard partition table corruption. Your partition table is corrupt so if it writes to a certain part of the map it will end up overwriting the permissions of other files, etc.

Smart report looks okay. There's a few errors but nothing outside of the ordinary and definitely nothing like what a failing disk looks like (thousands of errors usually in some of those categories you have 0 in). If you can't fsck it the server needs a reimage for sure. This server is currently broken. It does say there have been 24 unsafe shutdowns and this may have happened during an apt upgrade or when the Minecraft server was running which kicked this off.

I helped someone with a similar issue who must have spent 20 hours trying to troubleshoot his broken server here: https://jamesachambers.com/minecraft-bedrock-edition-ubuntu-dedicated-server-guide/#comment-11524

It took a while (he was using the server for other things) but I finally convinced him it takes 2 hours to reimage it and you've spent 20 trying to fix a broken server. Even if you succeed you just wasted 18 hours and it's not a smart/productive use of your time to try to fix machines this broken. I strongly suggest this gets reimaged and backups restored if it won't take a fsck!

This is the only other one I've seen as broken as this one and once he reimaged everything worked 1000x better and faster anyway because it's likely you have a bunch of other issues going on as well since that partition table is corrupt on that drive.

interesting. I reinstalled ubuntu ~3 months ago, but I will again. and oddly enough I don't have any other problems on the server (currently a website, qbittorrent server, nas and minecraft server)

Yeah absolutely, this seems very isolated to the Minecraft files in your case. That's why I was pretty sure it wasn't really a failing disk and your smart report doesn't look like a failing disk either at all.

How did you try the fsck repair? Probably the best way is to boot up the Ubuntu CD/installer and then just go to CLI mode and do a fsck -p /dev/sda2 if that isn't what you did (this avoids all the mounting problems, and no files are actually running off the disk if you boot from the CD/USB drive).

When you said it wasn't running properly did it give you errors? If you can post some output there may be more we can do or you can just roll with the reimage if it has only been 3 months (sorry, like I said this is not common and I hate to say that, but if you have some output we can see if we can avoid it).

Yes i did try running fsck repair. fsck will tell me /dev/sdb is in use, so I unmount it, umount tells me /dev/sdb is already unmounted, and repeat. Keep in mind I was booting from a live usb install.

I think I will just reinstall soon

Hmmm,

Ok, that's actually good news, I thought you were getting failures from fsck itself that it couldn't repair the partition (not likely unless it's really bad, and I feel like you'd be having issues with your other services if it was super super corrupted, it's probably only one or two bad file pointers in the entire partition table). We can probably overcome this.

Check and make sure it isn't automounting the folders on the Ubuntu desktop or anything like that (if this is the desktop installer version).

Try unmounting all the partitions as well with:
sudo umount /dev/sdb1
sudo umount /dev/sdb2

To check all the mounts:
cd /media
automounts should show up here

Try using e2fsck -y -f /dev/sdb2 instead of fsck as sometimes this will get around that. I would try repairing by partition (sdb2). sdb1 will likely be a MSDOS boot partition and won't have anything to do with these issues.

There may also be a swap partition being loaded by /etc/fstab that is locking up the drive (or another way).

The commands fuser -k /dev/sdb2 and
cat /proc/mounts

may be able to tell us what is using the disk! Worst case scenario people sometimes suggest using a different "rescue CD" if you happen to have another Linux live CD around.

was able to run fsck with your help, thank you.

fsck /dev/sdb2/
returns

/dev/sdb2: clean, 308/65536 files, 64393/2622144 blocks

Running

e2fsck -y -f /dev/sdb2
returns

Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summery information
/dev/sdb2/: 308/65536 files (1.9% non-contiguous), 64393/262144 blocks

Excellent, if you do a:

sudo lsblk

do you have any other partitions? Depending on the options during setup that may be your only partition or you may have a sdb3, sdb4, etc.

It doesn't seem like enough files to be your root partition. Just my Raspberry Pi has 277288 files on it. My jamesachambers.com server has 510410 files.
It also didn't find anything wrong.

I'm guessing you enabled "Logical Volume Management" (I think it's default now unless you change it) and that your real "data" partition is going to be sdb5 or sdb6 or something potentially here. I don't think we have got it yet though as that output looks clean (and not nearly enough on there for it to be your entire root partition, it must be split into logical volumes I'm guessing here).

If there are more go ahead and give those a repair here too and let's see if we get any differing output (should be just a minor fix to one of them).

There were no more partitions. There was a sdb1, but I forgot to take a picture of the output as it was pretty much the same. But that was it for sdb (the boot drive, and where minecraftbe is stored)

Can you post the output of your:

sudo mount

and

sudo lsblk

?

We may need to do a fsck.ext4 -f /dev/sdb2 or a fsck.ext4 -p /dev/sdb2 potentially here. What is using /dev/sda? Or is there nothing in sda?

If you do a:
sudo find / -type f | wc -l

How many files are actually on your system? Because that repair only found 65,000 files which should not be enough for a functional Ubuntu install of any flavor. The mount command should tell you exactly what is mounted (and where) and I imagine things are not where they were intended to be / supposed to be here.

I'm going to close this as an issue here although I'd love to hear the outcome.

I strongly recommend that nobody ever spends days troubleshooting something like this though. I would literally have got in trouble at work for troubleshooting a server this borked for more than a few hours let alone days. It's not a smart way to use your time.

If you have backups the most intelligent and efficient use of your time is to use your backup recovery plan which usually will only take a couple to a few hours.

The answer in this case is that the root install of the drive cannot be on that SSD. There aren't enough files. It's likely unintentionally installed on an old HDD (/dev/sda) which makes sense that it would assign the primary sda slot to the boot device (although not universally true, but it's usually true). It might even already be a known bad drive.

It's possible there's a base install on the SSD but that /etc/fstab was pointing to SDA (or the PARTUUID of a partition on /dev/sda). If that HDD had an old Ubuntu install already this would have resulted in very very weird outcomes. 65,000 files might be the minimal extracted output of an install that has never been ran before. I wouldn't be surprised if that is the case and there is a base install that has never booted on the SSD because /etc/fstab pointed to /dev/sda.

Was this worth the time to troubleshoot it? Absolutely not, it won't take more than a few hours to install and reconfigure everything. That's the point I wanted to make here. If your server is this broken there are almost zero cases where it is worth it to lose days to something like this when a recovery takes hours.

No offense or disrespect intended to OP here at all and this is not meant to call out Fich420 in any way. I've had this happen a few times and even linked to another time this happened (he finally reinstalled and it fixed all his issues and more) and it has always turned out every time the right answer was to reinstall immediately.

This is universally true in IT and is how a professional operates at their job. Time is money and you don't want to spend it to find out things are on the wrong disk when really really weird stuff is happening like this. Otherwise the result is: you still need to reinstall it, and all the time troubleshooting it was wasted. You should never do this unless there's no backups and you have no choice. Some troubleshooting is good, and knowing extra tricks and things is really useful, you just have to balance it and give yourself a "red line" on the time where you know it's no longer worth chasing down this specific one and to reimage/restore a backup instead.

It's easier said than done even by me as an experienced tech, but hopefully people will learn from this going forward that going down this road with a broken server doesn't really make a lot of sense! The best that can happen is you spent 20 hours and fixed it avoiding a reimage (which would have taken 2-5 hours depending on configuration) for a total waste of ~15 hours so it's just never worth it!

It's in my personality to want to always find the answer. I have to fight that every day. In this case I probably did not do Fich420 any favors here by not just telling them immediately to reinstall. If your server is getting really, really weird stuff that is definitely what you should do! It's okay to do some troubleshooting but try to keep in your mind that once the time goes past a few hours it doesn't make sense to continue chasing down weird technical issues like this!