OpenIDC/liboauth2

Semaphore creation fails due to no space left

brownp2k opened this issue · 17 comments

We experienced Apache being killed (SIGSEGV), apparently due to this:
[Sun Dec 20 03:45:03.522921 2020] [oauth2:error] [pid 8085] oauth2_ipc_sema_post_config: sem_open() failed to create named semaphore /zzo-sema-8085.0x564b89a996e0: No space left on device (28)

It looks like oauth2_ipc_sema_post_config only frees the name before creating a new semaphore.

From the looks of it, a new semaphore file is created at least every 10 minutes, and there's 5 associated "sem.zzo" files created per main semaphore file. I don't see any old files getting cleaned up.

which platform are you on?

CentOS 7

This is running a source build that contains the fix you put in for handling mod order.

oauth2_ipc_sema_post_config is not supposed to be called twice; what threading model (mpm) are you using?

Server MPM:     prefork
  threaded:     no
    forked:     yes (variable process count)

can you try with worker or event for comparison?

I also applied a should-be-fix and tagged 1.4.0.1

I'm checking to see if it's possible to run with worker or event as it isn't a machine I control.

you can also skip that and test the updated master of liboauth2

Have been running with liboauth2 1.4.0.1 for about 3 hours now, and du -hs /dev/shm is showing a 0 size. Running ls -lsah /dev/shm currently shows 33 zzo-shm-* files that are all 7.9M in size. And finally, df -h shows only 40K used.

I'll check again in the morning, but it seems that 1.4.0.1 has fixed the issue.

It was running the next morning (Dec 24) but upon checking httpd this morning (Dec 28) it appears that it crashed due to SIGSEGV yesterday morning at 3am. Checking /dev/shm shows 788 files that are all 7.8M in size, yet du shows 0 and df shows 40K. Nothing in the log, and nothing in ABRT like previous crashes.

ow, can you try to make it core dump or run it in gdb?
or maybe share your setup with me (DM) so I can try and run/reproduce

Unfortunately, this is happening on our production machine so I can't readily share that setup. I've been trying to reproduce the issue in a CentOS 7 VM and haven't had any luck yet...I'm unsure whether it's an Apache-specific setup thing that I'm just not triggering in the same way or something else that is more machine/system specific.

After some more digging, I think the 3am "crash" on Dec 27 was a red herring. Log rolling activated, which triggered a graceful restart, which in turn triggered the "graceful restart resource issue" mentioned here: OpenIDC/mod_oauth2#7 (comment)

However, the 788 files in /dev/shm are all still there, but maybe that isn't as bad as it seems since df and du don't register them?

A possibly related issue I ran into this morning is that Apache failed to restart after performing a shared memory cleanup:
[Sat Jan 23 02:46:43.778903 2021] [core:emerg] [pid 26292] (28)No space left on device: AH00023: Couldn't create the rewrite-map mutex

Googling lead to:
https://serverfault.com/questions/991946/no-space-left-on-device-ah00023-couldnt-create-the-mpm-accept-mutex-when-re?newreg=460432d6a1dd4d8d98adc3daecead8e1

Clearing out the listed apache semaphores based on that link's advice allowed Apache to restart without failing.

ok, thanks for the additonal info, hope to get to the bottom of this soon

can you try 7de0b49 ?

A quick test this morning shows that 7de0b49 allows httpd to be restarted without any apparent issues, and it also appears that zzo-shm-* files are no longer being created in /dev/shm.