Stuck telemetry stream
Opened this issue · 0 comments
jktjkt commented
On one of the lab boxes, the telemetry streams were not getting through -- neither on new connections, nor on the old ones. This is how the logs looked like:
Mar 15 10:11:48 roadm-c1 rousette[29777]: change: 57267 bytes
Mar 15 10:11:48 roadm-c1 rousette[29777]: [::1]:36514(for=_RNlpRAX0): new event, ∑ queue size = 0
Mar 15 10:11:48 roadm-c1 rousette[29777]: [::1]:36514(for=_RNlpRAX0): sent one event
Mar 15 10:11:48 roadm-c1 rousette[29777]: [::1]:36514(for=_RNlpRAX0): sleeping
Mar 15 10:11:48 roadm-c1 rousette[29777]: [::1]:36514(for=_a8bpCpSe): new event, ∑ queue size = 27858024
Mar 15 10:11:49 roadm-c1 rousette[29777]: change: 57269 bytes
Mar 15 10:11:49 roadm-c1 rousette[29777]: [::1]:36514(for=_RNlpRAX0): new event, ∑ queue size = 0
Mar 15 10:11:49 roadm-c1 rousette[29777]: [::1]:36514(for=_RNlpRAX0): sent one event
Mar 15 10:11:49 roadm-c1 rousette[29777]: [::1]:36514(for=_RNlpRAX0): sleeping
Mar 15 10:11:49 roadm-c1 rousette[29777]: [::1]:36514(for=_a8bpCpSe): new event, ∑ queue size = 27920198
I have not tried fetching other data besides the telemetry stream :(.
Stuff was fixed by restarting the reverse proxy (which took ages to finish):
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: State 'stop-sigterm' timed out. Killing.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Killing process 227 (nghttpx) with signal SIGKILL.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Killing process 233 (nghttpx) with signal SIGKILL.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Killing process 234 (nghttpx) with signal SIGKILL.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Killing process 30866 (nghttpx) with signal SIGKILL.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Main process exited, code=killed, status=9/KILL
Mar 15 10:12:00 roadm-c1 rousette[29777]: [::1]:36514(for=_RNlpRAX0): closed (INTERNAL_ERROR)
Mar 15 10:12:00 roadm-c1 rousette[29777]: [::1]:36514(for=_a8bpCpSe): closed (INTERNAL_ERROR)
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Failed with result 'timeout'.
Mar 15 10:12:00 roadm-c1 systemd[1]: Stopped nghttpx reverse proxy for handling incoming HTTP requests.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Consumed 25min 12.430s CPU time.
Mar 15 10:12:00 roadm-c1 systemd[1]: Starting nghttpx reverse proxy for handling incoming HTTP requests...
Mar 15 10:12:00 roadm-c1 rousette[29777]: change: 57266 bytes
Mar 15 10:12:00 roadm-c1 nghttpx[30885]: 2024-03-15T10:12:00.260Z 30885 30885 34ff0138 NOTICE (shrpx.cc:971) Listening on 0.0.0.0
Mar 15 10:12:00 roadm-c1 nghttpx[30885]: 2024-03-15T10:12:00.260Z 30885 30885 34ff0138 NOTICE (shrpx.cc:971) Listening on [::]
Mar 15 10:12:00 roadm-c1 nghttpx[30885]: 2024-03-15T10:12:00.261Z 30885 30885 34ff0138 NOTICE (shrpx.cc:1780) Worker process [30886] spawned
Mar 15 10:12:00 roadm-c1 systemd[1]: Started nghttpx reverse proxy for handling incoming HTTP requests.
Mar 15 10:12:00 roadm-c1 nghttpx[30886]: 2024-03-15T10:12:00.280Z 30885 30886 34ff0138 NOTICE (shrpx_connection_handler.cc:414) [LISTEN:0x02156390] Created worker thread #0
Mar 15 10:12:00 roadm-c1 nghttpx[30885]: 2024-03-15T10:12:00.281Z 30885 30885 34ff0138 NOTICE (shrpx.cc:1553) Worker process pid=30886 is ready
Mar 15 10:12:00 roadm-c1 rousette[29777]: change: 57277 bytes
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): GET /telemetry/optics
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): new event, ∑ queue size = 0
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sent one event
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sleeping
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): GET /telemetry/optics
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): new event, ∑ queue size = 0
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sent one event
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sleeping
Mar 15 10:12:01 roadm-c1 rousette[29777]: change: 57269 bytes
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): new event, ∑ queue size = 0
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sent one event
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sleeping
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): new event, ∑ queue size = 0
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sent one event
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sleeping
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): GET /telemetry/optics
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): new event, ∑ queue size = 0
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): sent one event
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): sleeping
Mar 15 10:12:02 roadm-c1 rousette[29777]: change: 57279 bytes
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): new event, ∑ queue size = 0
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sent one event
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sleeping
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): new event, ∑ queue size = 0
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sent one event
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sleeping
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): new event, ∑ queue size = 0
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): sent one event
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): sleeping
Mar 15 10:12:02 roadm-c1 rousette[29777]: change: 57276 bytes
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): new event, ∑ queue size = 0
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sent one event
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sleeping
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): new event, ∑ queue size = 0
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sent one event
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sleeping
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): new event, ∑ queue size = 0
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): sent one event
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): sleeping
Mar 15 10:12:03 roadm-c1 rousette[29777]: change: 57273 bytes
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): new event, ∑ queue size = 0
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sent one event
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sleeping
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): new event, ∑ queue size = 0
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sent one event
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sleeping
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): new event, ∑ queue size = 0
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): sent one event
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): sleeping
Mar 15 10:12:04 roadm-c1 nghttpx[30886]: 10.0.0.177 - - [15/Mar/2024:10:12:02 +0000] "GET /telemetry/optics HTTP/1.1" 200 248725 "-" "curl/7.81.0"
Mar 15 10:12:04 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): closed (INTERNAL_ERROR)
And the full log from nghttpx
:
Mar 15 09:54:19 roadm-c1 nghttpx[233]: 10.0.0.177 - - [15/Mar/2024:09:50:52 +0000] "GET /telemetry/optics HTTP/1.1" 200 0 "-" "Python/3.10 aiohttp/3.9.3"
Mar 15 10:06:13 roadm-c1 nghttpx[233]: 10.0.0.177 - - [15/Mar/2024:09:54:20 +0000] "GET /telemetry/optics HTTP/1.1" 200 0 "-" "Python/3.10 aiohttp/3.9.3"
Mar 15 10:09:01 roadm-c1 nghttpx[233]: 10.0.0.177 - - [15/Mar/2024:10:08:55 +0000] "GET /telemetry/optics HTTP/1.1" 200 0 "-" "curl/7.81.0"
Mar 15 10:09:10 roadm-c1 nghttpx[233]: 10.0.0.177 - - [15/Mar/2024:10:09:02 +0000] "GET /telemetry/optics HTTP/1.1" 200 0 "-" "curl/7.81.0"
Mar 15 10:10:30 roadm-c1 nghttpx[233]: 2024-03-15T10:10:30.060Z 227 233 c9d94777 NOTICE (shrpx_worker_process.cc:113) Graceful shutdown signal received
Mar 15 10:10:30 roadm-c1 nghttpx[233]: 2024-03-15T10:10:30.062Z 227 233 24cd368c NOTICE (shrpx_worker.cc:523) [WORKER:0xd36c58] Graceful shutdown commenc
Mar 15 10:10:30 roadm-c1 systemd[1]: Stopping nghttpx reverse proxy for handling incoming HTTP requests...
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: State 'stop-sigterm' timed out. Killing.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Killing process 227 (nghttpx) with signal SIGKILL.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Killing process 233 (nghttpx) with signal SIGKILL.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Killing process 234 (nghttpx) with signal SIGKILL.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Killing process 30866 (nghttpx) with signal SIGKILL.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Main process exited, code=killed, status=9/KILL
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Failed with result 'timeout'.
Mar 15 10:12:00 roadm-c1 systemd[1]: Stopped nghttpx reverse proxy for handling incoming HTTP requests.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Consumed 25min 12.430s CPU time.
Mar 15 10:12:00 roadm-c1 systemd[1]: Starting nghttpx reverse proxy for handling incoming HTTP requests...
Mar 15 10:12:00 roadm-c1 nghttpx[30885]: 2024-03-15T10:12:00.260Z 30885 30885 34ff0138 NOTICE (shrpx.cc:971) Listening on 0.0.0.0
Mar 15 10:12:00 roadm-c1 nghttpx[30885]: 2024-03-15T10:12:00.260Z 30885 30885 34ff0138 NOTICE (shrpx.cc:971) Listening on [::]
Mar 15 10:12:00 roadm-c1 nghttpx[30885]: 2024-03-15T10:12:00.261Z 30885 30885 34ff0138 NOTICE (shrpx.cc:1780) Worker process [30886] spawned
Mar 15 10:12:00 roadm-c1 systemd[1]: Started nghttpx reverse proxy for handling incoming HTTP requests.
Mar 15 10:12:00 roadm-c1 nghttpx[30886]: 2024-03-15T10:12:00.280Z 30885 30886 34ff0138 NOTICE (shrpx_connection_handler.cc:414) [LISTEN:0x02156390] Creat
Mar 15 10:12:00 roadm-c1 nghttpx[30885]: 2024-03-15T10:12:00.281Z 30885 30885 34ff0138 NOTICE (shrpx.cc:1553) Worker process pid=30886 is ready
Mar 15 10:12:04 roadm-c1 nghttpx[30886]: 10.0.0.177 - - [15/Mar/2024:10:12:02 +0000] "GET /telemetry/optics HTTP/1.1" 200 248725 "-" "curl/7.81.0"