SSH Server crashing - 24.04
ewdurbin opened this issue · 3 comments
@pablogsal reported that downloads.nyc1.psf.io was inaccessible via SSH over the weekend.
Restarting the host resolved that access issue, but it is unclear how it reached that state. It appears that this may be related to socket activation of the SSH daemon: https://discourse.ubuntu.com/t/sshd-now-uses-socket-based-activation-ubuntu-22-10-and-later/30189
While investigating I determined that another host was in a similar state:
chungus:~ ee$ ssh consul-1.nyc1.psf.io
Connection closed by 159.203.82.149 port 22
And investigated from the salt host:
ee@salt:~$ sudo salt 'consul-1.nyc1.psf.io' cmd.run 'systemctl status ssh.socket'
consul-1.nyc1.psf.io:
x ssh.socket - OpenBSD Secure Shell server socket
Loaded: loaded (/usr/lib/systemd/system/ssh.socket; enabled; preset: enabled)
Active: failed (Result: resources) since Fri 2024-08-23 06:01:46 UTC; 3 days ago
Duration: 1w 1d 12h 23min 22.290s
Triggers: * ssh.service
Listen: [::]:22 (Stream)
CPU: 1ms
Aug 26 16:00:40 consul-1 systemd[1]: Failed to listen on ssh.socket - OpenBSD Secure Shell server socket.
Aug 26 16:15:37 consul-1 (sd-listen)[203212]: ssh.socket: Failed to create listening socket ([::]:22): Address already in use
Aug 26 16:15:37 consul-1 systemd[1]: ssh.socket: Failed to receive listening socket ([::]:22): Input/output error
Aug 26 16:15:37 consul-1 systemd[1]: ssh.socket: Failed to listen on sockets: Input/output error
Aug 26 16:15:37 consul-1 systemd[1]: ssh.socket: Failed with result 'resources'.
Aug 26 16:15:37 consul-1 systemd[1]: Failed to listen on ssh.socket - OpenBSD Secure Shell server socket.
Aug 26 16:30:43 consul-1 systemd[1]: ssh.socket: Failed to receive listening socket ([::]:22): Input/output error
Aug 26 16:30:43 consul-1 systemd[1]: ssh.socket: Failed to listen on sockets: Input/output error
Aug 26 16:30:43 consul-1 systemd[1]: ssh.socket: Failed with result 'resources'.
Aug 26 16:30:43 consul-1 systemd[1]: Failed to listen on ssh.socket - OpenBSD Secure Shell server socket.
ERROR: Minions returned with non-zero exit code
ee@salt:~$ sudo salt 'consul-1.nyc1.psf.io' cmd.run 'systemctl status sshd.service'
consul-1.nyc1.psf.io:
* ssh.service - OpenBSD Secure Shell server
Loaded: loaded (/usr/lib/systemd/system/ssh.service; enabled; preset: enabled)
Active: inactive (dead) since Fri 2024-08-23 06:01:46 UTC; 3 days ago
Duration: 23h 37min 48.102s
TriggeredBy: x ssh.socket
Docs: man:sshd(8)
man:sshd_config(5)
Main PID: 2794639 (code=exited, status=0/SUCCESS)
Tasks: 1 (limit: 1113)
Memory: 5.7M (peak: 73.2M swap: 0B swap peak: 3.1M)
CPU: 1min 54.393s
CGroup: /system.slice/ssh.service
`-1002 "sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups"
Aug 26 16:30:32 consul-1 sshd[206329]: fatal: chroot("/run/sshd"): No such file or directory [preauth]
Aug 26 16:30:32 consul-1 sshd[206331]: fatal: chroot("/run/sshd"): No such file or directory [preauth]
Aug 26 16:30:43 consul-1 systemd[1]: Dependency failed for ssh.service - OpenBSD Secure Shell server.
Aug 26 16:30:43 consul-1 systemd[1]: ssh.service: Job ssh.service/start failed with result 'dependency'.
Aug 26 16:30:50 consul-1 sshd[207096]: fatal: chroot("/run/sshd"): No such file or directory [preauth]
Aug 26 16:31:43 consul-1 sshd[207224]: fatal: chroot("/run/sshd"): No such file or directory [preauth]
Aug 26 16:32:36 consul-1 sshd[207319]: fatal: chroot("/run/sshd"): No such file or directory [preauth]
Aug 26 16:33:04 consul-1 sshd[207389]: fatal: chroot("/run/sshd"): No such file or directory [preauth]
Aug 26 16:33:05 consul-1 sshd[207391]: fatal: chroot("/run/sshd"): No such file or directory [preauth]
Aug 26 16:33:27 consul-1 sshd[207453]: fatal: chroot("/run/sshd"): No such file or directory [preauth]
ERROR: Minions returned with non-zero exit code
If it is something to do with the missing directory /run/sshd
, that does appear to be a smoking gun:
ee@salt:~$ sudo salt '*' cmd.run 'ls /run/sshd'
planet.nyc1.psf.io:
lb-b.nyc1.psf.io:
consul-2.nyc1.psf.io:
salt.nyc1.psf.io:
docs.nyc1.psf.io:
downloads.nyc1.psf.io:
bugs.nyc1.psf.io:
gnumailman.nyc1.psf.io:
codespeed.nyc1.psf.io:
backup.sfo1.psf.io:
lb-a.nyc1.psf.io:
cdn-logs.nyc1.psf.io:
hg.nyc1.psf.io:
consul-1.nyc1.psf.io:
ls: cannot access '/run/sshd': No such file or directory
consul-3.nyc1.psf.io:
ls: cannot access '/run/sshd': No such file or directory
buildbot.nyc1.psf.io:
mail.ams1.psf.io:
moin.nyc1.psf.io:
pythontest.nyc3.psf.io:
ERROR: Minions returned with non-zero exit code
Creating /run/sshd
restored access. It is not clear why that directory did not exist on consul-{1,3}.nyc1.psf.io
, or why it was restored when rebooting downloads.nyc1.psf.io
Possibly related: https://askubuntu.com/questions/1109934/ssh-server-stops-working-after-reboot-caused-by-missing-var-run-sshd/1110843#1110843
Calls out a potential conflict between systemd and the underlying kernel of VPS providers like ours (DigitalOcean)