Semaphore creation fails due to no space left

Question

Semaphore creation fails due to no space left

brownp2k opened this issue 4 years ago · 17 comments

We experienced Apache being killed (SIGSEGV), apparently due to this:
[Sun Dec 20 03:45:03.522921 2020] [oauth2:error] [pid 8085] oauth2_ipc_sema_post_config: sem_open() failed to create named semaphore /zzo-sema-8085.0x564b89a996e0: No space left on device (28)

It looks like oauth2_ipc_sema_post_config only frees the name before creating a new semaphore.

From the looks of it, a new semaphore file is created at least every 10 minutes, and there's 5 associated "sem.zzo" files created per main semaphore file. I don't see any old files getting cleaned up.

Answer 1 · 2020-12-22T21:04:19.000Z

which platform are you on?

Answer 2 · 2020-12-22T21:16:39.000Z

CentOS 7

This is running a source build that contains the fix you put in for handling mod order.

Answer 3 · 2020-12-23T07:00:15.000Z

oauth2_ipc_sema_post_config is not supposed to be called twice; what threading model (mpm) are you using?

Answer 4 · 2020-12-23T10:04:45.000Z

Server MPM:     prefork
  threaded:     no
    forked:     yes (variable process count)

Answer 5 · 2020-12-23T12:05:44.000Z

can you try with worker or event for comparison?

Answer 6 · 2020-12-23T14:14:11.000Z

I also applied a should-be-fix and tagged 1.4.0.1

Answer 7 · 2020-12-23T14:38:14.000Z

I'm checking to see if it's possible to run with worker or event as it isn't a machine I control.

Answer 8 · 2020-12-23T14:43:25.000Z

you can also skip that and test the updated master of liboauth2

Answer 9 · 2020-12-23T22:49:28.000Z

Have been running with liboauth2 1.4.0.1 for about 3 hours now, and du -hs /dev/shm is showing a 0 size. Running ls -lsah /dev/shm currently shows 33 zzo-shm-* files that are all 7.9M in size. And finally, df -h shows only 40K used.

I'll check again in the morning, but it seems that 1.4.0.1 has fixed the issue.

Answer 10 · 2020-12-28T10:04:03.000Z

It was running the next morning (Dec 24) but upon checking httpd this morning (Dec 28) it appears that it crashed due to SIGSEGV yesterday morning at 3am. Checking /dev/shm shows 788 files that are all 7.8M in size, yet du shows 0 and df shows 40K. Nothing in the log, and nothing in ABRT like previous crashes.

Answer 11 · 2020-12-28T12:04:31.000Z

ow, can you try to make it core dump or run it in gdb?
or maybe share your setup with me (DM) so I can try and run/reproduce

Answer 12 · 2020-12-29T12:23:14.000Z

Unfortunately, this is happening on our production machine so I can't readily share that setup. I've been trying to reproduce the issue in a CentOS 7 VM and haven't had any luck yet...I'm unsure whether it's an Apache-specific setup thing that I'm just not triggering in the same way or something else that is more machine/system specific.

Answer 13 · 2020-12-29T15:34:50.000Z

After some more digging, I think the 3am "crash" on Dec 27 was a red herring. Log rolling activated, which triggered a graceful restart, which in turn triggered the "graceful restart resource issue" mentioned here: OpenIDC/mod_oauth2#7 (comment)

However, the 788 files in /dev/shm are all still there, but maybe that isn't as bad as it seems since df and du don't register them?

Answer 14 · 2021-01-23T10:07:40.000Z

A possibly related issue I ran into this morning is that Apache failed to restart after performing a shared memory cleanup:
[Sat Jan 23 02:46:43.778903 2021] [core:emerg] [pid 26292] (28)No space left on device: AH00023: Couldn't create the rewrite-map mutex

Googling lead to:
https://serverfault.com/questions/991946/no-space-left-on-device-ah00023-couldnt-create-the-mpm-accept-mutex-when-re?newreg=460432d6a1dd4d8d98adc3daecead8e1

Clearing out the listed apache semaphores based on that link's advice allowed Apache to restart without failing.

Answer 15 · 2021-01-24T18:16:25.000Z

ok, thanks for the additonal info, hope to get to the bottom of this soon

Answer 16 · 2021-01-30T22:01:24.000Z

can you try 7de0b49 ?

Answer 17 · 2021-02-01T10:23:30.000Z

A quick test this morning shows that 7de0b49 allows httpd to be restarted without any apparent issues, and it also appears that zzo-shm-* files are no longer being created in /dev/shm.