PaloAltoNetworks/SafeNetworking

During heavy load, ES times out and processing halts

Closed this issue · 2 comments

When ES is under heavy load or pauses due to garbage collection, the default timeout (10s) is not enough and when it times out all SFN processing halts and never starts again. See stack trace:

GET http://localhost:9200/threat-*/_search [status:N/A request:10.018s]
Traceback (most recent call last):
  File "/home/ubuntu/safe-networking/sfn-env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/home/ubuntu/safe-networking/sfn-env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 383, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.6/http/client.py", line 1331, in getresponse
    response.begin()
  File "/usr/lib/python3.6/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.6/http/client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

There are a few ways to try and fix this. Set timeout to more than 10s (as documented here ) or figure out why GC is taking so long and fix that.

This is fixed in fbc36f0