fuzzball-muck/fuzzball

New password hashing algorhithm causes crash on login

Closed this issue · 21 comments

Finally created my own Github account just report this, instead of nudging poor @foxbird to note these things for me for years, hee.

The new password hashing algorithm (#720) causes an immediate hard crash on login. Very easy to reproduce:

Start MUCK. Log in. Change password (with legacy_password_hash at default, so the new password is hashed with the new algorithm). QUIT. Reconnect. Immediate hard crash on entering the connect <char> <password> command. Even tested it using the minimal and starter databases, to make sure it was nothing to do with my existing database.

Restart, @tune legacy_password_hash=yes. Repeat all other steps. No crash, everything works normally.

Of note: I reviewed the logs/status file. When using the new algorithm and attempting to log in after changing a character's password (i.e., after the password has been hashed to the new algorithm), the logs/status file shows

CONNECTED: Room Zero(0), descriptor 9, from localhost

instead of the correct character and dbref. This entry occurs in the instant before the fbmuck process crashes. When using the OLD algorithm, connecting a character looks normal

CONNECTED: Keeper(2), descriptor 9, from localhost

So it would appear to be something in the login code isn't handling the new hash, and screws up who/what is actually being logged in, attempting to literally log in as Room Zero. Assuming the logs/status file is correct, anyway.

Secondarily: also when using the new algorithm, the @newpassword command likewise misreports the user in the logs/status file. However, unlike the CONNECTED line, the newpassword line gets the actual dbref correct. For example, using the starter database, connecting as One, and using @newpassword to change Keeper's password, logs/status shows

NEWPASS'ED: Room Zero(2) by One(1)

When using the OLD algorithm (@tune legacy_password_hash=yes), it correctly shows

NEWPASS'ED: Keeper(2) by One(1)

Since @password doesn't add an entry to logs/status the way @newpassword does, I have no idea if it's having the same object reference weirdness or not. Nonetheless, there's all kinds of issues with the processing of the new algorithm. These things are just what I've found so far, more or less immediately upon starting up with a new git pull.

Sure hope it's something weird with my setup (bog standard Debian Bullseye), and not that something as basic as logging in after changing the password wasn't tested before this new code was committed. c.c

Thanks for reporting this. I will take a look at it ASAP (probably tomorrow) and see if I can reproduce / fix it. This was tested using roughly the procedure outlined above, so I'm a little surprised, but it seems likely I messed something up with it.

I haven't yet been able to reproduce this, but I'm still trying.

@puckmousit Hey there -- Could you send me your config.log file? I just want to make sure we're building it the same way. I am. You can attach it to this issue.

@tanabi File attached!
config.log

I've tried this from a completely fresh checkout, completely fresh install rather than my well traveled test MUCK that I did the initial testing on.

I've tried it both inside gdb (how I usually run my test MUCK) and outside of gdb (it's not unheard of for something to behave in gdb but crash outside of gdb, something about how gdb hooks into memory management...). I've tried with a few different users; new users, users that come in the stock database, etc. I've verified its using the new password hashes. So far, haven't been able to get it to crash. I see we've got a config.log file, I'll see if there's some difference there that might be causing a problem.

If I review the config.log and don't see anything there, I will try using the database from a 'live' MUCK and see if I can reproduce it ... maybe try installing puckmousit's distro in a VM and try that too ...

@puckmousit I'm glad I asked you for this because your configure command line was quite a bit different than mine (I just used --with-ssl). That said, compiling the same way you did, I'm still not able to reproduce it.

I will try installing your distro in a VM and see if I can reproduce that. If I still can't, I'll give you some instructions to run the MUCK in gdb (GNU debugger) so that I can take a peek at what's going on in your build.

@tanabi Just for the record, I pulled a fresh git clone, used ONLY --prefix and --with-ssl for configure, did a make immediately with zero edits to any files like config.h. I used the minimaldb straight out of the git clone as well. Still crashed for me.

@puckmousit I'm installing Debian Bullseye in a VM right now, fingers crossed I'll be able to reproduce it. Honestly, I'm happy it's not THAT easy to reproduce, I'd have been pretty ashamed :)

@tanabi Oh, I forgot, not 100% bog-standard. I do have the Debian Backports kernel (Bullseye Backports) installed. The only backports item that is, everything else is indeed standard Bullseye package.

@puckmousit That explains some of it :) My uname wasn't matching yours. Part of the problem is I had a Bullseye release candidate instead of the correct release, but that would be the other part.

@puckmousit Is this the proper set of instructions to set up bullseye backports ? https://wiki.debian.org/Backports I ask because it says "bookworm backports" ... but I think that's because it's backporting bookworm to bullseye.

@tanabi You would use bullseye instead of bookworm. Bookworm is the current Debian Stable, so the docs reflect that. Bullseye is the previous version (now called "oldstable"). Basically just replace any instance of bookworm with bullseye in the instructions.

@tanabi Or to make it even easier, here's my /etc/apt/sources.list file. :) With a .txt extension just for uploading.
sources.list.txt

@puckmousit Awesome, my uname matches yours now. Now, let's see if I can break some stuff :)

@puckmousit Hey, wow, I got it! Segmentation fault. I should now be able to fix it.

@tanabi I'm no programmer but I was kind of assuming it was going to be some obscure functionality change in a library version, probably openssl. Since other than the hash algorithm, I sort of figured the code that processes an actual login wouldn't need to be changed.

And now that I say that, openssl functionality changes aren't obscure. It changed hugely between Bullseye (libssl1.1) and Bookworm (libssl3). If you're similarly using a distro that's on the v3 branch, yeah I could totally see openssl being the issue.

Though if that's the culprit, definitely a big deal since there's still plenty of distros on 1.1 branch.

@puckmousit It was a good ole fashion buffer overflow that was obfuscated by my compiler :) I've got a PR in to resolve this, once @wyld-sw has merged it, master branch should be good to go again.

@puckmousit I will keep the debian VM I built around and do future testing on it as well, cause why not.

@tanabi Hurray! Also interesting. First time I've ever personally encountered compiler making the difference, though I was aware that can happen.

@puckmousit Yeah, there were a lot of issues that combined to make this a problem.

I had worked on this issue over the course of like 6+ months ... I wrote like 95% of it in a few days and then let it sit due to RL. The original version actually had the 'math' right to avoid the buffer overflow. WELL, months later I came back and reviewed the code, and I re-did the math and got it wrong the second time :)

And then I compiled and tested it, and it worked. My test MUCK is totally stable, no issues. So hey I must have done it right! But no, that was being masked by my compiler, which I think was just 'smart' enogh to give allocated memory some overflow padding. I've seen different compilers make a difference, I've seen running the debugger vs not make a difference ... sometimes it's the little things that get you.

This was a lot more common in the past ... I remember 20-some years ago, I'd compile code with Sun Studio Compiler AND GNU C so the two would double-check each other. :D

Anyway, it's helpful to know what platform our 'customers' use so now I've got a test environment I can use to make sure things work for you in the future.