getting "too many open files error" in splunk
kyaparla opened this issue · 1 comments
I see following error in splunk, even after updating file descriptor limit to much higher number.
http: Accept error: accept tcp 0.0.0.0:8098: accept4: too many open files; retrying in 20ms.
Prometheus data in splunk is not continuous, which I think is due to above problem. And there are several gaps and seeing data at some intervals.
It would probably take a lot of connections to exceed to the number of available fds on the system. Usually we would only get this if the connections were never being released after processing, which shouldn't be the case.
How many Prometheus systems are using this as a remote write target? You could try adjusting maxClients up or down and see if that helps.
Otherwise can you let me know if you have a set value for net.ipv4.tcp_tw_recycle
and net.ipv4.tcp_tw_reuse
in your sysctl.conf?
When you are in this state, does netstat
show a lot of TIME_WAIT connections?