Process is leaking file handles on Linux
Opened this issue · 1 comments
I have now twice received the "Too many open files" error from idena-go. Both times I just increased the limit, but even 65535 was not high enough (I now have the limit at 1048576).
No process should be using that many file handles. So I investigating the open files:
lsof -n | awk '{ print $2; }' | sort -rn | uniq -c | sort -rn | head -1
This returned the following counts during about first 30 minutes of idena-go run:
5707 11409
8008 11409
29782 11409
23453 11409
18220 11409
11409 is the process id of the idena-go process.
Examining the lsof -n
output more closely, I found it contains thousands of open IPv4 sockets. For example:
exec-iden 11409 mikko 436u IPv4 869981 0t0 TCP xxx.xxx.xxx.xxx:40405->130.61.245.64:40407 (ESTABLISHED)
exec-iden 11409 11410 mikko 436u IPv4 869981 0t0 TCP xxx.xxx.xxx.xxx:40405->130.61.245.64:40407 (ESTABLISHED)
exec-iden 11409 11411 mikko 436u IPv4 869981 0t0 TCP xxx.xxx.xxx.xxx:40405->130.61.245.64:40407 (ESTABLISHED)
exec-iden 11409 11412 mikko 436u IPv4 869981 0t0 TCP xxx.xxx.xxx.xxx:40405->130.61.245.64:40407 (ESTABLISHED)
exec-iden 11409 11413 mikko 436u IPv4 869981 0t0 TCP xxx.xxx.xxx.xxx:40405->130.61.245.64:40407 (ESTABLISHED)
exec-iden 11409 11414 mikko 436u IPv4 869981 0t0 TCP xxx.xxx.xxx.xxx:40405->130.61.245.64:40407 (ESTABLISHED)
exec-iden 11409 11415 mikko 436u IPv4 869981 0t0 TCP xxx.xxx.xxx.xxx:40405->130.61.245.64:40407 (ESTABLISHED)
exec-iden 11409 11416 mikko 436u IPv4 869981 0t0 TCP xxx.xxx.xxx.xxx:40405->130.61.245.64:40407 (ESTABLISHED)
exec-iden 11409 11417 mikko 436u IPv4 869981 0t0 TCP xxx.xxx.xxx.xxx:40405->130.61.245.64:40407 (ESTABLISHED)
exec-iden 11409 11419 mikko 436u IPv4 869981 0t0 TCP xxx.xxx.xxx.xxx:40405->130.61.245.64:40407 (ESTABLISHED)
exec-iden 11409 11434 mikko 436u IPv4 869981 0t0 TCP xxx.xxx.xxx.xxx:40405->130.61.245.64:40407 (ESTABLISHED)
exec-iden 11409 11445 mikko 436u IPv4 869981 0t0 TCP xxx.xxx.xxx.xxx:40405->130.61.245.64:40407 (ESTABLISHED)
exec-iden 11409 11469 mikko 436u IPv4 869981 0t0 TCP xxx.xxx.xxx.xxx:40405->130.61.245.64:40407 (ESTABLISHED)
Here I filtered for only a single target IP address, but there are hundreds or thousands of IP addresses like this. I cannot find the listed process id's (11410 and above) on my system.
At the same time, idena-go log doesn't report an excessive number of connections, and total-peers=12 own-shard-peers=9.
My Idena-go version is 0.27.2. I am running a Debian 9.13 (stretch) system on a VPS with 3 vCPU's and 4 GB RAM. Kernel version is 4.9.0-13-amd64 #1 SMP Debian 4.9.228-1 (2020-07-05) x86_64 GNU/Linux
Ever since posting the above, I kept getting "Too many open files" errors from the Idena process, about once every week, with higher probability near validations.
It turned out that my setting ulimit in /etc/security/limits.conf had NOT taken effect. Checking /proc/****/limits, the Idena process limits were still at 4096. I am running the Idena process under systemd, and setting the limits in the .service file fixed it:
LimitNOFILE=65536
LimitNOFILESoft=65536
Two things learned from all this:
-
Check /proc/****/limits for the actual in-effect limits.
-
Whatever lsof shows are NOT "real" open files that count to this limit. After all, lsof consistently showed tens of thousands of entries for my Idena process, yet it kept most of the time under the in-effect limit of 4096 files.