Helioviewer-Project/esajpip-SWHV

Unexpected denial of service

Closed this issue · 11 comments

Hello, previously we had issues where the log file was blowing up and you made a change to truncate the file size. This has worked to keep from filling up the file system, but doesn't address the underlying problem.

The new issue is that when this same denial of service type behavior occurs, new connections to the JPIP server can't be made and the server needs to be restarted before it resumes working.

The log is flooded with the following logs, I'm not sure if this problem is being caused by an external user or if it's internal to esajpip. Please advise.

2022-06-22 12:04:49,223: New connection from 96.127.104.58:51370 [-1] 
2022-06-22 12:04:49,223: /root/build/esajpip-SWHV/esajpip/esajpip/src/esa_jpip_server.cc:139: ERROR: The new socket can not be sent to the child process: Bad file descriptor 
2022-06-22 12:04:49,223: New connection from 96.127.104.58:51370 [-1] 
2022-06-22 12:04:49,223: /root/build/esajpip-SWHV/esajpip/esajpip/src/esa_jpip_server.cc:139: ERROR: The new socket can not be sent to the child process: Bad file descriptor 
2022-06-22 12:04:49,223: New connection from 96.127.104.58:51370 [-1] 
2022-06-22 12:04:49,223: /root/build/esajpip-SWHV/esajpip/esajpip/src/esa_jpip_server.cc:139: ERROR: The new socket can not be sent to the child process: Bad file descriptor 
2022-06-22 12:04:49,223: New connection from 96.127.104.58:51370 [-1] 
2022-06-22 12:04:49,223: /root/build/esajpip-SWHV/esajpip/esajpip/src/esa_jpip_server.cc:139: ERROR: The new socket can not be sent to the child process: Bad file descriptor 
2022-06-22 12:04:49,223: New connection from 96.127.104.58:51370 [-1] 
2022-06-22 12:04:49,223: /root/build/esajpip-SWHV/esajpip/esajpip/src/esa_jpip_server.cc:139: ERROR: The new socket can not be sent to the child process: Bad file descriptor

From my reading of the code and log, accept() returns -1, which is used in IsValid() which checks using poll().
I'm pretty sure that is wrong and the result of accept() should be rejected and the reason logged. I suspect one of those:

     [EMFILE]           The per-process descriptor table is full.
     [ENFILE]           The system file table is full.
     [ENOMEM]           Insufficient memory was available to complete the operation.

Good at least that the log is not filling the storage anymore. We are getting closer to the root of the problem.

Please try with e301600.

This version is running now. Will keep an eye on it.

This was working well until now. Looks like a different issue.
Running e301600, a new problem that blocks new connections.

2022-07-11 11:25:42,505: /root/esajpip-SWHV/esajpip/esajpip/src/esa_jpip_server.cc:129: ERROR: Problems accepting a new connection: Too many open files 
2022-07-11 11:25:42,505: /root/esajpip-SWHV/esajpip/esajpip/src/esa_jpip_server.cc:129: ERROR: Problems accepting a new connection: Too many open files 
2022-07-11 11:25:42,505: /root/esajpip-SWHV/esajpip/esajpip/src/esa_jpip_server.cc:129: ERROR: Problems accepting a new connection: Too many open files 
2022-07-11 11:25:42,505: /root/esajpip-SWHV/esajpip/esajpip/src/esa_jpip_server.cc:129: ERROR: Problems accepting a new connection: Too many open files 
2022-07-11 11:25:42,505: /root/esajpip-SWHV/esajpip/esajpip/src/esa_jpip_server.cc:129: ERROR: Problems accepting a new connection: Too many open files 
2022-07-11 11:25:42,505: /root/esajpip-SWHV/esajpip/esajpip/src/esa_jpip_server.cc:129: ERROR: Problems accepting a new connection: Too many open files 
2022-07-11 11:25:42,505: /root/esajpip-SWHV/esajpip/esajpip/src/esa_jpip_server.cc:129: ERROR: Problems accepting a new connection: Too many open files 
2022-07-11 11:25:42,505: /root/esajpip-SWHV/esajpip/esajpip/src/esa_jpip_server.cc:129: ERROR: Problems accepting a new connection: Too many open files 
2022-07-11 11:25:42,505: /root/esajpip-SWHV/esajpip/esajpip/src/esa_jpip_server.cc:129: ERROR: Problems accepting a new connection: Too many open files

Got to the root of the problem I think. Could you please up the limit number of open files for the user running esajpip? Should be pretty easy for Linux and it’s “harmless”. I’m on vacation, typing on a phone.

This seemed to work for a while, but even after updating the limit on the number of open files this error comes up again later on. Is there a close() missing somewhere?

This happened again on our system. Does it happen at ROB?

Never noticed on the low traffic ROB server, the higher traffic IAS server also seems quite stable (maybe @ebuchlin can comment).

The program needs to open each JP2 file since the JPX files contain just pointers to image data. The files are opened, mmap'ed and immediately closed (mmap adds an implicit reference to the file descriptor which is removed on munmap)

Are you sure that the program environment and the system file limits were applied? On our server ulimit -n is 1000000 and /proc/sys/fs/file-max is 2097152

We don't think that we ever had this issue.

Commit e30380e and later may have solved this issue.
It occurred sometimes when a client dropped the connection immediately after esajpip accepted it. This may be, for example, a network monitoring service which periodically checks if the server is responding.

I still recommend that the esajpip server is allocated a sufficiently large number of open files.

Since the reinstallation of ROB server in February, we are running esajpip on port 80. Being a common port, it is constantly assaulted by scanning from all corners. This allowed the discovery and fixing of several other descriptor leaks, one lockup and one crash.

Please make sure you are running tag 1.7 or later.
I think what you were experiencing is now solved. I'm closing this issue, please open another if you encounter further problems.