openHPI/poseidon

Investigate Watchdog-triggered service restarts

Closed this issue · 11 comments

Recently, we introduced a systemd watchdog with #478. Since then, we are able to monitor hanging or irresponsive instances of Poseidon. Just a few hours after merging and deploying #516, the first issue occurred:

Log excerpt

Jan 24 21:34:01 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:01.939323Z" level=debug code=204 duration=165.205872ms method=PATCH path=/api/v1/runners/29-6ad9cb23-baff-11ee-bdbd-fa163e351aea/files user_agent="Faraday v2.9.0"
Jan 24 21:34:02 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:02.005082Z" level=debug code=200 duration="206.083µs" method=POST path=/api/v1/runners/29-6ad9cb23-baff-11ee-bdbd-fa163e351aea/execute user_agent="Faraday v2.9.0"
Jan 24 21:34:02 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:02.023497Z" level=info msg="Running execution" environment_id=29 executionID=39347e6f-0a68-4445-9a41-a7e95887cd30 package=api runner_id=29-6ad9cb23-baff-11ee-bdbd-fa163e
351aea
Jan 24 21:34:02 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:02.384744Z" level=info msg="Execution returned" environment_id=29 package=api runner_id=29-6ad9cb23-baff-11ee-bdbd-fa163e351aea
Jan 24 21:34:02 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:02.384952Z" level=debug code=200 duration=361.601934ms method=GET path=/api/v1/runners/29-6ad9cb23-baff-11ee-bdbd-fa163e351aea/websocket user_agent=
Jan 24 21:34:02 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:02.513860Z" level=debug code=204 duration="84.441µs" method=GET path=/api/v1/health user_agent="Faraday v2.9.0"
Jan 24 21:34:02 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:02.583203Z" level=debug code=200 duration=188.614668ms method=GET path=/api/v1/runners/29-6ad9cb23-baff-11ee-bdbd-fa163e351aea/files user_agent="Faraday v2.9.0"
Jan 24 21:34:37 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:37.424957Z" level=debug code=204 duration=164.30203ms method=PATCH path=/api/v1/runners/29-6ad9cb23-baff-11ee-bdbd-fa163e351aea/files user_agent="Faraday v2.9.0"
Jan 24 21:34:37 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:37.491384Z" level=debug code=200 duration="209.57µs" method=POST path=/api/v1/runners/29-6ad9cb23-baff-11ee-bdbd-fa163e351aea/execute user_agent="Faraday v2.9.0"
Jan 24 21:34:37 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:37.511312Z" level=info msg="Running execution" environment_id=29 executionID=f54a94ad-7514-4f4a-8f5d-69dccf31faf5 package=api runner_id=29-6ad9cb23-baff-11ee-bdbd-fa163e
351aea
Jan 24 21:34:37 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:37.870235Z" level=info msg="Execution returned" environment_id=29 package=api runner_id=29-6ad9cb23-baff-11ee-bdbd-fa163e351aea
Jan 24 21:34:37 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:37.870416Z" level=debug code=200 duration=359.221779ms method=GET path=/api/v1/runners/29-6ad9cb23-baff-11ee-bdbd-fa163e351aea/websocket user_agent=
Jan 24 21:34:38 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:38.070750Z" level=debug code=200 duration=196.229579ms method=GET path=/api/v1/runners/29-6ad9cb23-baff-11ee-bdbd-fa163e351aea/files user_agent="Faraday v2.9.0"
Jan 24 21:34:44 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:44.436247Z" level=debug code=204 duration="85.794µs" method=GET path=/api/v1/health user_agent="check_http/v2.2 (monitoring-plugins 2.2)"
Jan 24 21:34:49 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:49.410189Z" level=debug code=204 duration=164.210478ms method=PATCH path=/api/v1/runners/29-6ad9cb23-baff-11ee-bdbd-fa163e351aea/files user_agent="Faraday v2.9.0"
Jan 24 21:34:49 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:49.474266Z" level=debug code=200 duration="220.841µs" method=POST path=/api/v1/runners/29-6ad9cb23-baff-11ee-bdbd-fa163e351aea/execute user_agent="Faraday v2.9.0"
Jan 24 21:34:49 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:49.493323Z" level=info msg="Running execution" environment_id=29 executionID=e5d3f78d-4479-4254-8c23-dc4b1eb82e04 package=api runner_id=29-6ad9cb23-baff-11ee-bdbd-fa163e
351aea
Jan 24 21:34:49 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:49.862356Z" level=info msg="Execution returned" environment_id=29 package=api runner_id=29-6ad9cb23-baff-11ee-bdbd-fa163e351aea
Jan 24 21:34:49 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:49.862474Z" level=debug code=200 duration=369.287602ms method=GET path=/api/v1/runners/29-6ad9cb23-baff-11ee-bdbd-fa163e351aea/websocket user_agent=
Jan 24 21:34:50 poseidon-terraform poseidon[317966]: time="2024-01-24T21:34:50.055811Z" level=debug code=200 duration=189.319303ms method=GET path=/api/v1/runners/29-6ad9cb23-baff-11ee-bdbd-fa163e351aea/files user_agent="Faraday v2.9.0"
Jan 24 21:35:02 poseidon-terraform telegraf[632]: 2024-01-24T21:35:02Z E! [outputs.influxdb_v2] When writing to [http://monitoring-terraform.compute.internal:8086]: Post "http://monitoring-terraform.compute.internal:8086/api/v2/write?bucket=tele
graf&org=codeocean": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Jan 24 21:35:02 poseidon-terraform telegraf[632]: 2024-01-24T21:35:02Z E! [agent] Error writing to outputs.influxdb_v2: failed to send metrics to any configured server(s)
Jan 24 21:35:02 poseidon-terraform systemd[1]: poseidon.service: Watchdog timeout (limit 15s)!
Jan 24 21:35:02 poseidon-terraform systemd[1]: poseidon.service: Killing process 317966 (poseidon) with signal SIGABRT.
Jan 24 21:35:02 poseidon-terraform poseidon[317966]: time="2024-01-24T21:35:02.170308Z" level=info msg="Received SIGINT, shutting down..." package=main
Jan 24 21:35:02 poseidon-terraform poseidon[317966]: time="2024-01-24T21:35:02.170495Z" level=info msg="Server closed" error="http: Server closed" listener="0.0.0.0:7200" package=main
Jan 24 21:35:02 poseidon-terraform poseidon[317966]: time="2024-01-24T21:35:02.170657Z" level=info msg="Stopped KeepEnvironmentsSynced" error="<nil>" package=environment
Jan 24 21:35:02 poseidon-terraform poseidon[317966]: 2024/01/24 21:35:02 influxdb2client E! Write error: internal error: unexpected error writing points to database: timeout, batch kept for retrying
Jan 24 21:35:02 poseidon-terraform poseidon[317966]: time="2024-01-24T21:35:02.737923Z" level=debug code=204 duration=1.76297424s method=PATCH path=/api/v1/runners/29-6ad9cb23-baff-11ee-bdbd-fa163e351aea/files user_agent="Faraday v2.9.0"
Jan 24 21:35:03 poseidon-terraform poseidon[317966]: time="2024-01-24T21:35:03.248937Z" level=debug msg="Stopping CPU profiler" package=main
Jan 24 21:35:03 poseidon-terraform poseidon-exit-alert.sh[322522]: Poseidon exited unsuccessful. Sending Sentry Event
Jan 24 21:35:03 poseidon-terraform poseidon-exit-alert.sh[322523]: Event dispatched: bb0527df-55a4-4456-aa02-b6c8b273bb3f
Jan 24 21:35:03 poseidon-terraform systemd[1]: poseidon.service: Failed with result 'watchdog'.
Jan 24 21:35:03 poseidon-terraform systemd[1]: poseidon.service: Consumed 1h 1min 33.517s CPU time.
Jan 24 21:35:33 poseidon-terraform systemd[1]: poseidon.service: Scheduled restart job, restart counter is at 1.
Jan 24 21:35:33 poseidon-terraform systemd[1]: Stopped Poseidon.
Jan 24 21:35:33 poseidon-terraform systemd[1]: poseidon.service: Consumed 1h 1min 33.517s CPU time.
Jan 24 21:35:33 poseidon-terraform systemd[1]: Starting Poseidon...
Jan 24 21:35:33 poseidon-terraform poseidon[322530]: time="2024-01-24T21:35:33.690733Z" level=debug msg="Starting CPU profiler" package=main
Jan 24 21:35:33 poseidon-terraform poseidon[322530]: time="2024-01-24T21:35:33.697778Z" level=info msg="Loading environments" package=environment

I noticed that issue through POSEIDON-4P. As a consequence, we should probably investigate what caused the issue and try finding the root cause. Maybe we also need some trace logging to investigate further...

We saw further occurrences of this issue happening. In order to provide more specific details, I temporarily enabled trace logs and collected one occurrence below. Interestingly, Poseidon reported that the watchdog-triggered health request took 7 seconds, which is already half the watchdog timeout. Together with the configuration of (watchdog timeout) / 2, this exceeded the systemd enforced timeout, leading to a restart.

Details

Jan 25 09:12:29 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:29.025223Z" level=trace code=204 duration="78.85µs" method=GET path=/api/v1/health user_agent="Poseidon/f16efa6 (CSSGx0XUZEgJqtTsEnCxAwvQFySoK-AQackBqWfPTO0=) Go-http-clie
nt/1.1"
Jan 25 09:12:29 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:29.026616Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 25 09:12:36 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:36.528274Z" level=trace code=204 duration="78.209µs" method=GET path=/api/v1/health user_agent="Poseidon/f16efa6 (CSSGx0XUZEgJqtTsEnCxAwvQFySoK-AQackBqWfPTO0=) Go-http-cli
ent/1.1"
Jan 25 09:12:36 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:36.529298Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 25 09:12:39 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:39.064121Z" level=trace msg="Requesting Nomad Exec" command="echo -ne \"\\x1EPoseidon unset \\${!NOMAD_@} $(date +%s%3N)\\x1E\"; unset ${!NOMAD_@} && /bin/bash -c \"echo -
ne \\\"\\\\x1EPoseidon /bin/bash -c \\$(date +%s%3N)\\\\x1E\\\"; /bin/bash -c \\\"echo -ne \\\\\\\"\\\\\\\\x1EPoseidon rm --recursive --force ./* ;tar --extract --absolute-names --verbose --file=/dev/stdin; \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\
"; env CODEOCEAN=true /bin/bash -c \\\\\\\"rm --recursive --force ./* ;tar --extract --absolute-names --verbose --file=/dev/stdin;\\\\\\\"; ec=\\\\\\$?; echo -ne \\\\\\\"\\\\\\\\x1EPoseidon exit \\\\\\$ec \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\" 
&& exit \\\\\\$ec\\\"\"" environment_id=29 package=nomad runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e351aea
Jan 25 09:12:39 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:39.216597Z" level=debug code=204 duration=153.516933ms method=PATCH path=/api/v1/runners/29-2165e4b0-bb61-11ee-b9a7-fa163e351aea/files user_agent="Faraday v2.9.0"
Jan 25 09:12:39 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:39.254843Z" level=debug code=200 duration="183.45µs" method=POST path=/api/v1/runners/29-2165e4b0-bb61-11ee-b9a7-fa163e351aea/execute user_agent="Faraday v2.9.0"
Jan 25 09:12:39 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:39.273087Z" level=info msg="Running execution" environment_id=29 executionID=f930f459-f060-4583-8138-2ec26926a0b4 package=api runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e35
1aea
Jan 25 09:12:39 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:39.273204Z" level=trace msg="Sending message to client" environment_id=29 message="&{start  0}" package=ws runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e351aea
Jan 25 09:12:39 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:39.273371Z" level=trace msg="Requesting Nomad Exec" command="echo -ne \"\\x1EPoseidon unset \\${!NOMAD_@} $(date +%s%3N)\\x1E\"; unset ${!NOMAD_@} && /bin/bash -c \"echo -
ne \\\"\\\\x1EPoseidon /sbin/setuser user \\$(date +%s%3N)\\\\x1E\\\"; /sbin/setuser user /bin/bash -c \\\"echo -ne \\\\\\\"\\\\\\\\x1EPoseidon until \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\"; until [ -e /tmp/stderr_1706173959273277803.fifo ]; do 
sleep 0.01; done; (echo -ne \\\\\\\"\\\\\\\\x1EPoseidon python3 -B -m unittest --verbose test_style \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\"; env CODEOCEAN=true /bin/bash -c \\\\\\\"python3 -B -m unittest --verbose test_style\\\\\\\"; ec=\\\\\\$?
; echo -ne \\\\\\\"\\\\\\\\x1EPoseidon exit \\\\\\$ec \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\" && exit \\\\\\$ec) 2> /tmp/stderr_1706173959273277803.fifo\\\"\"" environment_id=29 package=nomad runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e351aea
Jan 25 09:12:39 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:39.273568Z" level=trace msg="Requesting Nomad Exec" command="echo -ne \"\\x1EPoseidon /sbin/setuser user $(date +%s%3N)\\x1E\"; /sbin/setuser user /bin/bash -c \"echo -ne 
\\\"\\\\x1EPoseidon mkfifo \\$(date +%s%3N)\\\\x1E\\\"; mkfifo /tmp/stderr_1706173959273277803.fifo && (cat /tmp/stderr_1706173959273277803.fifo; rm /tmp/stderr_1706173959273277803.fifo); ec=\\$?; echo -ne \\\"\\\\x1EPoseidon exit \\$ec \\$(date
 +%s%3N)\\\\x1E\\\" && exit \\$ec\"" environment_id=29 package=nomad runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e351aea
Jan 25 09:12:39 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:39.920419Z" level=trace msg="Sending message to client" environment_id=29 message="&{stdout ************* Module aufgabe121\r\n aufgabe121.py:2: convention (C0326, bad-whi
tespace, ) No space allowed before bracket\r\n print (\"hallo\")\r\n       ^\r\n \r\n -----------------------------------\r\n Your code has been rated at 0.00/10\r\n \r\n 0}" package=ws runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e351aea
Jan 25 09:12:39 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:39.988776Z" level=trace msg="Sending message to client" environment_id=29 message="&{stdout   0}" package=ws runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e351aea
Jan 25 09:12:40 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:40.063612Z" level=info msg="Execution returned" environment_id=29 package=api runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e351aea
Jan 25 09:12:40 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:40.063751Z" level=debug code=200 duration=790.745508ms method=GET path=/api/v1/runners/29-2165e4b0-bb61-11ee-b9a7-fa163e351aea/websocket user_agent=
Jan 25 09:12:40 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:40.064276Z" level=trace msg="Sending message to client" environment_id=29 message="&{exit  0}" package=ws runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e351aea
Jan 25 09:12:40 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:40.064985Z" level=trace msg="Writing loop done" environment_id=29 package=ws runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e351aea
Jan 25 09:12:40 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:40.153736Z" level=debug code=200 duration="162.39µs" method=POST path=/api/v1/runners/29-2165e4b0-bb61-11ee-b9a7-fa163e351aea/execute user_agent="Faraday v2.9.0"
Jan 25 09:12:40 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:40.173107Z" level=info msg="Running execution" environment_id=29 executionID=f998e567-62e5-4611-8246-cd1413000d54 package=api runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e35
1aea
Jan 25 09:12:40 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:40.173257Z" level=trace msg="Sending message to client" environment_id=29 message="&{start  0}" package=ws runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e351aea
Jan 25 09:12:40 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:40.173601Z" level=trace msg="Requesting Nomad Exec" command="echo -ne \"\\x1EPoseidon unset \\${!NOMAD_@} $(date +%s%3N)\\x1E\"; unset ${!NOMAD_@} && /bin/bash -c \"echo -
ne \\\"\\\\x1EPoseidon /sbin/setuser user \\$(date +%s%3N)\\\\x1E\\\"; /sbin/setuser user /bin/bash -c \\\"echo -ne \\\\\\\"\\\\\\\\x1EPoseidon until \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\"; until [ -e /tmp/stderr_1706173960173508421.fifo ]; do 
sleep 0.01; done; (echo -ne \\\\\\\"\\\\\\\\x1EPoseidon python3 -B -m unittest --verbose r121 \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\"; env CODEOCEAN=true /bin/bash -c \\\\\\\"python3 -B -m unittest --verbose r121\\\\\\\"; ec=\\\\\\$?; echo -ne \
\\\\\\"\\\\\\\\x1EPoseidon exit \\\\\\$ec \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\" && exit \\\\\\$ec) 2> /tmp/stderr_1706173960173508421.fifo\\\"\"" environment_id=29 package=nomad runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e351aea
Jan 25 09:12:40 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:40.173875Z" level=trace msg="Requesting Nomad Exec" command="echo -ne \"\\x1EPoseidon /sbin/setuser user $(date +%s%3N)\\x1E\"; /sbin/setuser user /bin/bash -c \"echo -ne 
\\\"\\\\x1EPoseidon mkfifo \\$(date +%s%3N)\\\\x1E\\\"; mkfifo /tmp/stderr_1706173960173508421.fifo && (cat /tmp/stderr_1706173960173508421.fifo; rm /tmp/stderr_1706173960173508421.fifo); ec=\\$?; echo -ne \\\"\\\\x1EPoseidon exit \\$ec \\$(date
 +%s%3N)\\\\x1E\\\" && exit \\$ec\"" environment_id=29 package=nomad runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e351aea
Jan 25 09:12:40 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:40.411149Z" level=trace msg="Sending message to client" environment_id=29 message="&{stderr test_assess (r121.Assess) ...  0}" package=ws runner_id=29-2165e4b0-bb61-11ee-b
9a7-fa163e351aea
Jan 25 09:12:40 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:40.411307Z" level=trace msg="Sending message to client" environment_id=29 message="&{stderr ok\r\n\r\n----------------------------------------------------------------------\r\nRan 1 test in 0.000s\r\n\r\nOK\r\n 0}" package=ws runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e351aea
Jan 25 09:12:40 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:40.476147Z" level=info msg="Execution returned" environment_id=29 package=api runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e351aea
Jan 25 09:12:40 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:40.476244Z" level=debug code=200 duration=303.261173ms method=GET path=/api/v1/runners/29-2165e4b0-bb61-11ee-b9a7-fa163e351aea/websocket user_agent=
Jan 25 09:12:40 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:40.476676Z" level=trace msg="Sending message to client" environment_id=29 message="&{exit  0}" package=ws runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e351aea
Jan 25 09:12:40 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:40.476901Z" level=trace msg="Writing loop done" environment_id=29 package=ws runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e351aea
Jan 25 09:12:48 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:48.342542Z" level=trace msg="Requesting Nomad Exec" command="echo -ne \"\\x1EPoseidon unset \\${!NOMAD_@} $(date +%s%3N)\\x1E\"; unset ${!NOMAD_@} && /bin/bash -c \"echo -ne \\\"\\\\x1EPoseidon /bin/bash -c \\$(date +%s%3N)\\\\x1E\\\"; /bin/bash -c \\\"echo -ne \\\\\\\"\\\\\\\\x1EPoseidon rm --recursive --force ./* ;tar --extract --absolute-names --verbose --file=/dev/stdin; \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\"; env CODEOCEAN=true /bin/bash -c \\\\\\\"rm --recursive --force ./* ;tar --extract --absolute-names --verbose --file=/dev/stdin;\\\\\\\"; ec=\\\\\\$?; echo -ne \\\\\\\"\\\\\\\\x1EPoseidon exit \\\\\\$ec \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\" && exit \\\\\\$ec\\\"\"" environment_id=29 package=nomad runner_id=29-45bde69c-bb5f-11ee-b9a7-fa163e351aea
Jan 25 09:12:49 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:49.153235Z" level=trace msg="Requesting Nomad Exec" command="echo -ne \"\\x1EPoseidon unset \\${!NOMAD_@} $(date +%s%3N)\\x1E\"; unset ${!NOMAD_@} && /bin/bash -c \"echo -ne \\\"\\\\x1EPoseidon /bin/bash -c \\$(date +%s%3N)\\\\x1E\\\"; /bin/bash -c \\\"echo -ne \\\\\\\"\\\\\\\\x1EPoseidon rm --recursive --force ./* ;tar --extract --absolute-names --verbose --file=/dev/stdin; \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\"; env CODEOCEAN=true /bin/bash -c \\\\\\\"rm --recursive --force ./* ;tar --extract --absolute-names --verbose --file=/dev/stdin;\\\\\\\"; ec=\\\\\\$?; echo -ne \\\\\\\"\\\\\\\\x1EPoseidon exit \\\\\\$ec \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\" && exit \\\\\\$ec\\\"\"" environment_id=29 package=nomad runner_id=29-2165e4b0-bb61-11ee-b9a7-fa163e351aea
Jan 25 09:12:51 poseidon-terraform systemd[1]: poseidon.service: Watchdog timeout (limit 15s)!
Jan 25 09:12:51 poseidon-terraform systemd[1]: poseidon.service: Killing process 7526 (poseidon) with signal SIGABRT.
Jan 25 09:12:51 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:51.581537Z" level=info msg="Received SIGINT, shutting down..." package=main
Jan 25 09:12:51 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:51.581673Z" level=info msg="Server closed" error="http: Server closed" listener="0.0.0.0:7200" package=main
Jan 25 09:12:51 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:51.582480Z" level=info msg="Stopped KeepEnvironmentsSynced" error="<nil>" package=environment
Jan 25 09:12:51 poseidon-terraform poseidon[7526]: 2024/01/25 09:12:51 influxdb2client E! Write error: internal error: unexpected error writing points to database: timeout, batch kept for retrying
Jan 25 09:12:51 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:51.682991Z" level=trace msg="Retrying to write influx data..." error="internal error: unexpected error writing points to database: timeout" package=monitoring retryAttempts=0
Jan 25 09:12:51 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:51.683209Z" level=trace code=204 duration=7.650711239s method=GET path=/api/v1/health user_agent="Poseidon/f16efa6 (CSSGx0XUZEgJqtTsEnCxAwvQFySoK-AQackBqWfPTO0=) Go-http-client/1.1"
Jan 25 09:12:51 poseidon-terraform telegraf[633]: 2024-01-25T09:12:51Z E! [outputs.influxdb_v2] When writing to [http://monitoring-terraform.compute.internal:8086]: Post "http://monitoring-terraform.compute.internal:8086/api/v2/write?bucket=telegraf&org=codeocean": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Jan 25 09:12:51 poseidon-terraform telegraf[633]: 2024-01-25T09:12:51Z E! [agent] Error writing to outputs.influxdb_v2: failed to send metrics to any configured server(s)
Jan 25 09:12:51 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:51.840372Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 25 09:12:52 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:52.201294Z" level=debug code=204 duration=3.859554814s method=PATCH path=/api/v1/runners/29-45bde69c-bb5f-11ee-b9a7-fa163e351aea/files user_agent="Faraday v2.9.0"
Jan 25 09:12:52 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:52.201417Z" level=debug code=204 duration=3.048805274s method=PATCH path=/api/v1/runners/29-2165e4b0-bb61-11ee-b9a7-fa163e351aea/files user_agent="Faraday v2.9.0"
Jan 25 09:12:52 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:52.269767Z" level=trace msg="Current Memory Usage" heap=42436480 package=main
Jan 25 09:12:52 poseidon-terraform poseidon[7526]: time="2024-01-25T09:12:52.629717Z" level=debug msg="Stopping CPU profiler" package=main
Jan 25 09:12:52 poseidon-terraform poseidon-exit-alert.sh[7871]: Poseidon exited unsuccessful. Sending Sentry Event
Jan 25 09:12:52 poseidon-terraform poseidon-exit-alert.sh[7872]: Event dispatched: d0ed36c2-4efa-43f3-baa2-088f9f6b5d3c
Jan 25 09:12:52 poseidon-terraform systemd[1]: poseidon.service: Failed with result 'watchdog'.
Jan 25 09:12:52 poseidon-terraform systemd[1]: poseidon.service: Consumed 4min 18.928s CPU time.
Jan 25 09:13:22 poseidon-terraform systemd[1]: poseidon.service: Scheduled restart job, restart counter is at 1.
Jan 25 09:13:22 poseidon-terraform systemd[1]: Stopped Poseidon.
Jan 25 09:13:22 poseidon-terraform systemd[1]: poseidon.service: Consumed 4min 18.928s CPU time.
Jan 25 09:13:22 poseidon-terraform systemd[1]: Starting Poseidon...

Thank you for your investigation! The fact that the health request is sometimes getting stuck, is worrying. Since deploying #478 two days ago, we have not seen this error on staging. Also in my local environment, I could not reproduce this behavior.

The only lead I have now is the blocking behavior of a few Mutex we encounter when checking the Prewarming Pool Size in the Health route. This lead does not seem very plausible, especially when considering the small amount of noise of other requests.

In #542, I have added a debug Goroutine that prints a stack trace when a health request is taking that long. This hopefully shows us where the unresponsive Goroutine is stuck.. I've deployed it already to our staging environment, but as we have not seen this behavior there the last two days, we may not get any results from that.

Thanks again for adding the debug statements. I just deployed Poseidon to our production environment including your latest changes, so that we can gather additional details. For that, I also enabled the TRACE logging (manually).

As discussed, I deployed the changes to our production environment. There, the issue re-appeared earlier today and the trace logs are attached. Unfortunately, however, the added debug statements were not executed (or I cannot find them in the logs), so that we still don't have a clue what's going on. Further, a grep on "Health route latency is too high" in the syslog further revealed "Pattern not found (press RETURN)". At least, the behaviour is somewhat "reproducible" among builds, allowing us to investigate further.

Details

Jan 26 15:14:00 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:00.246656Z" level=trace msg="Current Memory Usage" heap=14362192 package=main
Jan 26 15:14:02 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:02.487232Z" level=debug code=204 duration="78.149µs" method=GET path=/api/v1/health user_agent="Faraday v2.9.0"
Jan 26 15:14:02 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:02.487295Z" level=trace msg="Starting timeout for debugging the Goroutines" package=api
Jan 26 15:14:05 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:05.712254Z" level=trace code=204 duration="91.455µs" method=GET path=/api/v1/health user_agent="Poseidon/213628b (d16SEGg0TGhTybvPurAtqitw9WVBuTX9DejRqYxMGTw=) Go-http-cl
ient/1.1"
Jan 26 15:14:05 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:05.714576Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 26 15:14:05 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:05.712315Z" level=trace msg="Starting timeout for debugging the Goroutines" package=api
Jan 26 15:14:13 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:13.215730Z" level=trace code=204 duration="67.488µs" method=GET path=/api/v1/health user_agent="Poseidon/213628b (d16SEGg0TGhTybvPurAtqitw9WVBuTX9DejRqYxMGTw=) Go-http-cl
ient/1.1"
Jan 26 15:14:13 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:13.216144Z" level=trace msg="Starting timeout for debugging the Goroutines" package=api
Jan 26 15:14:13 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:13.216920Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 26 15:14:20 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:20.717831Z" level=trace code=204 duration="78.23µs" method=GET path=/api/v1/health user_agent="Poseidon/213628b (d16SEGg0TGhTybvPurAtqitw9WVBuTX9DejRqYxMGTw=) Go-http-cli
ent/1.1"
Jan 26 15:14:20 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:20.717877Z" level=trace msg="Starting timeout for debugging the Goroutines" package=api
Jan 26 15:14:20 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:20.719784Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 26 15:14:26 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:26.394373Z" level=trace msg="Starting timeout for debugging the Goroutines" package=api
Jan 26 15:14:26 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:26.394399Z" level=debug code=204 duration="109.519µs" method=GET path=/api/v1/health user_agent="Faraday v2.9.0"
Jan 26 15:14:26 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:26.404478Z" level=debug code=204 duration="46.9µs" method=GET path=/api/v1/health user_agent="Faraday v2.9.0"
Jan 26 15:14:26 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:26.404488Z" level=trace msg="Starting timeout for debugging the Goroutines" package=api
Jan 26 15:14:28 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:28.221080Z" level=trace code=204 duration="93.849µs" method=GET path=/api/v1/health user_agent="Poseidon/213628b (d16SEGg0TGhTybvPurAtqitw9WVBuTX9DejRqYxMGTw=) Go-http-cl
ient/1.1"
Jan 26 15:14:28 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:28.221097Z" level=trace msg="Starting timeout for debugging the Goroutines" package=api
Jan 26 15:14:28 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:28.222251Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 26 15:14:30 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:30.249698Z" level=trace msg="Current Memory Usage" heap=21962992 package=main
Jan 26 15:14:35 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:35.724433Z" level=trace code=204 duration="70.475µs" method=GET path=/api/v1/health user_agent="Poseidon/213628b (d16SEGg0TGhTybvPurAtqitw9WVBuTX9DejRqYxMGTw=) Go-http-cl
ient/1.1"
Jan 26 15:14:35 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:35.724457Z" level=trace msg="Starting timeout for debugging the Goroutines" package=api
Jan 26 15:14:35 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:35.725847Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.064448Z" level=debug code=410 duration="327.533µs" method=PATCH path=/api/v1/runners/29-b410eaf0-bc4a-11ee-81fc-fa163e351aea/files user_agent="Faraday v2.9.0"
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.067293Z" level=debug code=200 duration="177.55µs" method=POST path=/api/v1/runners user_agent="Faraday v2.9.0"
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.081503Z" level=trace msg="Requesting Nomad Exec" command="echo -ne \"\\x1EPoseidon unset \\${!NOMAD_@} $(date +%s%3N)\\x1E\"; unset ${!NOMAD_@} && /bin/bash -c \"echo 
-ne \\\"\\\\x1EPoseidon /bin/bash -c \\$(date +%s%3N)\\\\x1E\\\"; /bin/bash -c \\\"echo -ne \\\\\\\"\\\\\\\\x1EPoseidon rm --recursive --force ./* ;tar --extract --absolute-names --verbose --file=/dev/stdin; \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\
\"; env CODEOCEAN=true /bin/bash -c \\\\\\\"rm --recursive --force ./* ;tar --extract --absolute-names --verbose --file=/dev/stdin;\\\\\\\"; ec=\\\\\\$?; echo -ne \\\\\\\"\\\\\\\\x1EPoseidon exit \\\\\\$ec \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\"
 && exit \\\\\\$ec\\\"\"" environment_id=29 package=nomad runner_id=29-c62401b4-bc50-11ee-a2a0-fa163e351aea
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.096077Z" level=debug msg="Handle Allocation Event" ClientStatus=pending DesiredStatus=run NextAllocation= PrevAllocation= alloc_id=bd369cf7-2223-45b8-fe75-b45538b456b1
 package=nomad
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.122194Z" level=debug msg="Ignoring duplicate event" allocID=9637af18-0415-e860-c6d2-b35c68bf5643 package=nomad
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.236484Z" level=debug code=204 duration=155.311166ms method=PATCH path=/api/v1/runners/29-c62401b4-bc50-11ee-a2a0-fa163e351aea/files user_agent="Faraday v2.9.0"
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.280953Z" level=debug code=200 duration="159.093µs" method=POST path=/api/v1/runners/29-c62401b4-bc50-11ee-a2a0-fa163e351aea/execute user_agent="Faraday v2.9.0"
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.305791Z" level=info msg="Running execution" environment_id=29 executionID=b2fe4dfe-0863-4cb7-9601-4b6713b6bbfc package=api runner_id=29-c62401b4-bc50-11ee-a2a0-fa163e3
51aea
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.306119Z" level=trace msg="Requesting Nomad Exec" command="echo -ne \"\\x1EPoseidon unset \\${!NOMAD_@} $(date +%s%3N)\\x1E\"; unset ${!NOMAD_@} && /bin/bash -c \"echo 
-ne \\\"\\\\x1EPoseidon /sbin/setuser user \\$(date +%s%3N)\\\\x1E\\\"; /sbin/setuser user /bin/bash -c \\\"echo -ne \\\\\\\"\\\\\\\\x1EPoseidon until \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\"; until [ -e /tmp/stderr_1706282079305958477.fifo ]; do
 sleep 0.01; done; (echo -ne \\\\\\\"\\\\\\\\x1EPoseidon python3 -B /usr/lib/python3.8/webpython.py -f aufgabe141.py \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\"; env CODEOCEAN=true /bin/bash -c \\\\\\\"python3 -B /usr/lib/python3.8/webpython.py -f a
ufgabe141.py\\\\\\\"; ec=\\\\\\$?; echo -ne \\\\\\\"\\\\\\\\x1EPoseidon exit \\\\\\$ec \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\" && exit \\\\\\$ec) 2> /tmp/stderr_1706282079305958477.fifo\\\"\"" environment_id=29 package=nomad runner_id=29-c62401b
4-bc50-11ee-a2a0-fa163e351aea
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.306537Z" level=trace msg="Requesting Nomad Exec" command="echo -ne \"\\x1EPoseidon /sbin/setuser user $(date +%s%3N)\\x1E\"; /sbin/setuser user /bin/bash -c \"echo -ne
 \\\"\\\\x1EPoseidon mkfifo \\$(date +%s%3N)\\\\x1E\\\"; mkfifo /tmp/stderr_1706282079305958477.fifo && (cat /tmp/stderr_1706282079305958477.fifo; rm /tmp/stderr_1706282079305958477.fifo); ec=\\$?; echo -ne \\\"\\\\x1EPoseidon exit \\$ec \\$(dat
e +%s%3N)\\\\x1E\\\" && exit \\$ec\"" environment_id=29 package=nomad runner_id=29-c62401b4-bc50-11ee-a2a0-fa163e351aea
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.309230Z" level=trace msg="Sending message to client" environment_id=29 message="&{start  0}" package=ws runner_id=29-c62401b4-bc50-11ee-a2a0-fa163e351aea
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.491157Z" level=trace msg="Sending message to client" environment_id=29 message="&{stdout {\"cmd\": \"write\", \"stream\": \"stdout\", \"data\": \"Hey Simon.\"}\r\n{\"c
md\": \"write\", \"stream\": \"stdout\", \"data\": \" \"}\r\n{\"cmd\": \"write\", \"stream\": \"stdout\", \"data\": \"Wann geht dein Flug?\"}\r\n{\"cmd\": \"write\", \"stream\": \"stdout\", \"data\": \"\\n\"}\r\n 0}" package=ws runner_id=29-c624
01b4-bc50-11ee-a2a0-fa163e351aea
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.517672Z" level=debug msg="Ignoring duplicate event" allocID=bd369cf7-2223-45b8-fe75-b45538b456b1 package=nomad
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.659452Z" level=info msg="Execution returned" environment_id=29 package=api runner_id=29-c62401b4-bc50-11ee-a2a0-fa163e351aea
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.659574Z" level=debug code=200 duration=353.962601ms method=GET path=/api/v1/runners/29-c62401b4-bc50-11ee-a2a0-fa163e351aea/websocket user_agent=
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.660104Z" level=trace msg="Sending message to client" environment_id=29 message="&{exit  0}" package=ws runner_id=29-c62401b4-bc50-11ee-a2a0-fa163e351aea
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.660446Z" level=trace msg="Writing loop done" environment_id=29 package=ws runner_id=29-c62401b4-bc50-11ee-a2a0-fa163e351aea
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.663634Z" level=trace msg="Requesting Nomad Exec" command="echo -ne \"\\x1EPoseidon unset \\${!NOMAD_@} $(date +%s%3N)\\x1E\"; unset ${!NOMAD_@} && /bin/bash -c \"echo 
-ne \\\"\\\\x1EPoseidon /sbin/setuser user \\$(date +%s%3N)\\\\x1E\\\"; /sbin/setuser user /bin/bash -c \\\"echo -ne \\\\\\\"\\\\\\\\x1EPoseidon ls -l --time-style=+%s -1 --literal --recursive \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"./\\\\\\\\\\\\\\\\\\
\\\\\\\\\\\\\" \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\"; env CODEOCEAN=true /bin/bash -c \\\\\\\"ls -l --time-style=+%s -1 --literal --recursive \\\\\\\\\\\\\\\"./\\\\\\\\\\\\\\\"\\\\\\\"; ec=\\\\\\$?; echo -ne \\\\\\\"\\\\\\\\x1EPoseidon exit \\
\\\\$ec \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\" && exit \\\\\\$ec\\\"\"" environment_id=29 package=nomad runner_id=29-c62401b4-bc50-11ee-a2a0-fa163e351aea
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.756867Z" level=debug msg="Handle Allocation Event" ClientStatus=running DesiredStatus=run NextAllocation= PrevAllocation= alloc_id=bd369cf7-2223-45b8-fe75-b45538b456b1
 package=nomad
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.756957Z" level=debug msg="Runner started" package=runner runner_id=29-9f995b5c-bc5d-11ee-9c2f-fa163e351aea startupDuration=660.81133ms
Jan 26 15:14:39 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:39.858209Z" level=debug code=200 duration=194.681535ms method=GET path=/api/v1/runners/29-c62401b4-bc50-11ee-a2a0-fa163e351aea/files user_agent="Faraday v2.9.0"
Jan 26 15:14:43 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:43.226737Z" level=trace msg="Starting timeout for debugging the Goroutines" package=api
Jan 26 15:14:44 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:44.438531Z" level=trace msg="Starting timeout for debugging the Goroutines" package=api
Jan 26 15:14:49 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:49.193911Z" level=trace msg="Requesting Nomad Exec" command="echo -ne \"\\x1EPoseidon unset \\${!NOMAD_@} $(date +%s%3N)\\x1E\"; unset ${!NOMAD_@} && /bin/bash -c \"echo 
-ne \\\"\\\\x1EPoseidon /bin/bash -c \\$(date +%s%3N)\\\\x1E\\\"; /bin/bash -c \\\"echo -ne \\\\\\\"\\\\\\\\x1EPoseidon rm --recursive --force ./* ;tar --extract --absolute-names --verbose --file=/dev/stdin; \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\
\"; env CODEOCEAN=true /bin/bash -c \\\\\\\"rm --recursive --force ./* ;tar --extract --absolute-names --verbose --file=/dev/stdin;\\\\\\\"; ec=\\\\\\$?; echo -ne \\\\\\\"\\\\\\\\x1EPoseidon exit \\\\\\$ec \\\\\\$(date +%s%3N)\\\\\\\\x1E\\\\\\\"
 && exit \\\\\\$ec\\\"\"" environment_id=29 package=nomad runner_id=29-c62401b4-bc50-11ee-a2a0-fa163e351aea
Jan 26 15:14:50 poseidon-terraform systemd[1]: poseidon.service: Watchdog timeout (limit 15s)!
Jan 26 15:14:50 poseidon-terraform systemd[1]: poseidon.service: Killing process 36032 (poseidon) with signal SIGABRT.
Jan 26 15:14:50 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:50.834114Z" level=info msg="Received SIGINT, shutting down..." package=main
Jan 26 15:14:50 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:50.834279Z" level=info msg="Server closed" error="http: Server closed" listener="0.0.0.0:7200" package=main
Jan 26 15:14:50 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:50.834532Z" level=info msg="Stopped KeepEnvironmentsSynced" error="<nil>" package=environment
Jan 26 15:14:52 poseidon-terraform poseidon[36032]: 2024/01/26 15:14:52 influxdb2client E! Write error: internal error: unexpected error writing points to database: timeout, batch kept for retrying
Jan 26 15:14:52 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:52.556342Z" level=trace msg="Retrying to write influx data..." error="internal error: unexpected error writing points to database: timeout" package=monitoring retryAttemp
ts=0
Jan 26 15:14:52 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:52.556523Z" level=trace code=204 duration=9.329854566s method=GET path=/api/v1/health user_agent="Poseidon/213628b (d16SEGg0TGhTybvPurAtqitw9WVBuTX9DejRqYxMGTw=) Go-http-
client/1.1"
Jan 26 15:14:52 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:52.557481Z" level=debug code=204 duration=8.119019541s method=GET path=/api/v1/health user_agent="check_http/v2.2 (monitoring-plugins 2.2)"
Jan 26 15:14:52 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:52.571002Z" level=debug code=204 duration=3.378456256s method=PATCH path=/api/v1/runners/29-c62401b4-bc50-11ee-a2a0-fa163e351aea/files user_agent="Faraday v2.9.0"
Jan 26 15:14:52 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:52.704838Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 26 15:14:52 poseidon-terraform poseidon[36032]: time="2024-01-26T15:14:52.938960Z" level=debug msg="Stopping CPU profiler" package=main
Jan 26 15:14:53 poseidon-terraform poseidon-exit-alert.sh[37361]: Poseidon exited unsuccessful. Sending Sentry Event
Jan 26 15:14:53 poseidon-terraform poseidon-exit-alert.sh[37362]: Event dispatched: 76ac4536-0746-48cd-8f54-eec4a8b6718d
Jan 26 15:14:53 poseidon-terraform systemd[1]: poseidon.service: Failed with result 'watchdog'.
Jan 26 15:14:53 poseidon-terraform systemd[1]: poseidon.service: Consumed 15min 36.456s CPU time.
Jan 26 15:15:23 poseidon-terraform systemd[1]: poseidon.service: Scheduled restart job, restart counter is at 1.
Jan 26 15:15:23 poseidon-terraform systemd[1]: Stopped Poseidon.
Jan 26 15:15:23 poseidon-terraform systemd[1]: poseidon.service: Consumed 15min 36.456s CPU time.
Jan 26 15:15:23 poseidon-terraform systemd[1]: Starting Poseidon...

With this information, we can break down our scope. The bug must be between our request latency measurement and the Health route handling. In between is not much: the Gorilla library and our InfluxDB2Middleware. Therefore, I had a look into the InfluxDB2Middleware.

Peek into the InfluxDB client library

Let's consider the following example:

  1. We write our first InfluxDB data point. The library uses a channel to send the point to a buffer processor [1].
  2. When we write more points, there is a moment when the batch size is reached and the buffer will be flushed. As long as the flushing lasts, the buffer processor is not able to accept new data points. When we try to do so (i.e. when sending information about another health request), we are blocked [2].
  3. Normally, flushing the batch should be fast because it just uses another channel to write it to the write processor. But, we have the same principle here: When one batch is being processed, the attempt to write another batch will be blocked [3].
  4. At this point, the influx library tries to solve this issue by introducing a ring buffer. However, the usage of the ring buffer is synchronous. The write processor is still blocked by the HTTP request writing the Influx batch (which might timeout after a while) and blocked by first processing the whole buffer until it's empty or an error occurs and only then listens again to a batch write attempt.

All in all, the InfluxDB client library write handling is not as asynchronous as we expected.

To confirm this assumption, we might move our debugging process (#542) to the HTTPLoggingMiddleware or proceed with a fix.

To fix this issue, we might reduce our InfluxDB buffer configuration, but primary we should wrap all Influxdb writing attempts in a separate goroutine. With this measure, we remove the error propagation between InfluxDB and our request handling.
At this point, we only move the issue as for each request that is now not longer blocking, there is a background goroutine blocking. But, this issue is limited, as the number of blocked background goroutines is much less than (/bounded by) the number of all writing attempts within retryExpire. Therefore, this should not be an issue...

Awesome, that's a really good finding! When considering the various circumstances, I actually see the following "issues" worth investigating:

  • 0. Confirming that InfluxDB is causing the issue.
  • 1. Decouple InfluxDB writings from request handling.
  • 2. Ensure we receive more details about InfluxDB errors, e.g., in Sentry.
  • 3. Improve the reliability of metrics sent to Influx.

Following your investigation, and further noting that each of the watchdog-triggered restarts was preceded with an InfluxDB-related error message, I can follow your argumentation:

influxdb2client E! Write error: internal error: unexpected error writing points to database: timeout, batch kept for retrying

That having said, I would proceed with the list as given above. Probably, we can also close this current issue, after the first two steps (0., 1.), since we would except that this will be enough to have a stable watchdog. For the others, separate issues are likely beneficial.

Detailed Reasoning

  1. Confirming the issue right now is a cooperatively endeavour, but it helps us to be sure. While I have no doubt about Influx, I would like to be certain, so that we address the right issue.
  2. This will fix the actual bug, and we should probably start thinking and developing a solution even before we have the previously-mentioned confirmation (because we are sure "enough" and it's still a good idea).
  3. I know, we have discussed that plenty of times already. I know, that we are doing as much as we can already. I am aware that Influx' WriteFailedCallback does not indicate which attempt we are currently looking at. Still, I currently have the feeling we don't have any indication how often Influx doesn't work, and we also don't know how often a final resubmission fails.
  4. Regardless of the previous point, I got the feeling that our Influx integration is not very stable. Currently, I can't even tell whether it's the server component or the library (or the network in-between). Anyway, we should probably investigate and fix issues discovered.

Another issue occurred, this time triggering the Health route latency debugging.

Details

Jan 27 17:01:04 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:04.670403Z" level=trace msg="Starting timeout for debugging the Goroutines"
Jan 27 17:01:04 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:04.671834Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 27 17:01:08 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:08.610756Z" level=trace msg="Current Memory Usage" heap=31239264 package=main
Jan 27 17:01:12 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:12.173072Z" level=trace code=204 duration="74.702µs" method=GET path=/api/v1/health user_agent="Poseidon/ae86b1c (ON1XcxHq2oGIDd8EbVZUjVF15HhDd-ZDLExmU8IUOvU=) Go-http-cl
ient/1.1"
Jan 27 17:01:12 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:12.173112Z" level=trace msg="Starting timeout for debugging the Goroutines"
Jan 27 17:01:12 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:12.174389Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 27 17:01:19 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:19.676728Z" level=trace code=204 duration="71.205µs" method=GET path=/api/v1/health user_agent="Poseidon/ae86b1c (ON1XcxHq2oGIDd8EbVZUjVF15HhDd-ZDLExmU8IUOvU=) Go-http-cl
ient/1.1"
Jan 27 17:01:19 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:19.676833Z" level=trace msg="Starting timeout for debugging the Goroutines"
Jan 27 17:01:19 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:19.678871Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 27 17:01:20 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:20.162493Z" level=debug code=204 duration="143.915µs" method=GET path=/api/v1/health user_agent="check_http/v2.2 (monitoring-plugins 2.2)"
Jan 27 17:01:20 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:20.162553Z" level=trace msg="Starting timeout for debugging the Goroutines"
Jan 27 17:01:27 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:27.181283Z" level=trace code=204 duration="70.054µs" method=GET path=/api/v1/health user_agent="Poseidon/ae86b1c (ON1XcxHq2oGIDd8EbVZUjVF15HhDd-ZDLExmU8IUOvU=) Go-http-cl
ient/1.1"
Jan 27 17:01:27 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:27.181318Z" level=trace msg="Starting timeout for debugging the Goroutines"
Jan 27 17:01:27 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:27.182349Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 27 17:01:34 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:34.683891Z" level=trace code=204 duration="83.139µs" method=GET path=/api/v1/health user_agent="Poseidon/ae86b1c (ON1XcxHq2oGIDd8EbVZUjVF15HhDd-ZDLExmU8IUOvU=) Go-http-cl
ient/1.1"
Jan 27 17:01:34 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:34.683962Z" level=trace msg="Starting timeout for debugging the Goroutines"
Jan 27 17:01:34 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:34.687295Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 27 17:01:38 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:38.611797Z" level=trace msg="Current Memory Usage" heap=41185128 package=main
Jan 27 17:01:42 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:42.188206Z" level=trace code=204 duration="77.979µs" method=GET path=/api/v1/health user_agent="Poseidon/ae86b1c (ON1XcxHq2oGIDd8EbVZUjVF15HhDd-ZDLExmU8IUOvU=) Go-http-client/1.1"
Jan 27 17:01:42 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:42.188421Z" level=trace msg="Starting timeout for debugging the Goroutines"
Jan 27 17:01:42 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:42.190268Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 27 17:01:45 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:45.233561Z" level=debug code=204 duration="80.683µs" method=GET path=/api/v1/health user_agent="Faraday v2.9.0"
Jan 27 17:01:45 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:45.233597Z" level=trace msg="Starting timeout for debugging the Goroutines"
Jan 27 17:01:49 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:49.691844Z" level=trace code=204 duration="68.861µs" method=GET path=/api/v1/health user_agent="Poseidon/ae86b1c (ON1XcxHq2oGIDd8EbVZUjVF15HhDd-ZDLExmU8IUOvU=) Go-http-client/1.1"
Jan 27 17:01:49 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:49.692371Z" level=trace msg="Starting timeout for debugging the Goroutines"
Jan 27 17:01:49 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:49.693169Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 27 17:01:57 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:57.194371Z" level=trace code=204 duration="68.931µs" method=GET path=/api/v1/health user_agent="Poseidon/ae86b1c (ON1XcxHq2oGIDd8EbVZUjVF15HhDd-ZDLExmU8IUOvU=) Go-http-client/1.1"
Jan 27 17:01:57 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:57.195010Z" level=trace msg="Starting timeout for debugging the Goroutines"
Jan 27 17:01:57 poseidon-terraform poseidon[45103]: time="2024-01-27T17:01:57.195332Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 27 17:02:04 poseidon-terraform poseidon[45103]: time="2024-01-27T17:02:04.696287Z" level=trace msg="Starting timeout for debugging the Goroutines"
Jan 27 17:02:08 poseidon-terraform poseidon[45103]: time="2024-01-27T17:02:08.612805Z" level=trace msg="Current Memory Usage" heap=35894360 package=main
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: time="2024-01-27T17:02:09.696848Z" level=warning msg="Health route latency is too high"
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: goroutine profile: total 86
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 42 @ 0x43fb8e 0x4500c5 0xdb0bd0 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xdb0bcf#011github.com/openHPI/poseidon/pkg/storage.(*localStorage[...]).periodicallySendMonitoringData+0xef#011/opt/poseidon/pkg/storage/storage.go:177
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 11 @ 0x43fb8e 0x4500c5 0xdbf958 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xdbf957#011github.com/openHPI/poseidon/pkg/storage.(*localStorage[...]).periodicallySendMonitoringData+0xf7#011/opt/poseidon/pkg/storage/storage.go:177
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 7 @ 0x43fb8e 0x4500c5 0x7018a5 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x7018a4#011net/http.(*persistConn).writeLoop+0xe4#011/usr/lib/go-1.21/src/net/http/transport.go:2421
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 4 @ 0x43fb8e 0x438597 0x46ad25 0x518b67 0x519e5a 0x519e48 0x5ce785 0x5dfc05 0x63037b 0x4b4d58 0x63055e 0x62db30 0x633df8 0x633e01 0x6ff0ea 0x670ca3 0x670dd3 0x6ffed9 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x46ad24#011internal/poll.runtime_pollWait+0x84#011#011/usr/lib/go-1.21/src/runtime/netpoll.go:343
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x518b66#011internal/poll.(*pollDesc).wait+0x26#011#011/usr/lib/go-1.21/src/internal/poll/fd_poll_runtime.go:84
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x519e59#011internal/poll.(*pollDesc).waitRead+0x279#011/usr/lib/go-1.21/src/internal/poll/fd_poll_runtime.go:89
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x519e47#011internal/poll.(*FD).Read+0x267#011#011#011/usr/lib/go-1.21/src/internal/poll/fd_unix.go:164
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x5ce784#011net.(*netFD).Read+0x24#011#011#011#011/usr/lib/go-1.21/src/net/fd_posix.go:55
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x5dfc04#011net.(*conn).Read+0x44#011#011#011#011/usr/lib/go-1.21/src/net/net.go:179
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x63037a#011crypto/tls.(*atLeastReader).Read+0x3a#011#011/usr/lib/go-1.21/src/crypto/tls/conn.go:805
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x4b4d57#011bytes.(*Buffer).ReadFrom+0x97#011#011#011/usr/lib/go-1.21/src/bytes/buffer.go:211
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x63055d#011crypto/tls.(*Conn).readFromUntil+0xdd#011#011/usr/lib/go-1.21/src/crypto/tls/conn.go:827
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x62db2f#011crypto/tls.(*Conn).readRecordOrCCS+0x24f#011/usr/lib/go-1.21/src/crypto/tls/conn.go:625
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x633df7#011crypto/tls.(*Conn).readRecord+0x157#011#011/usr/lib/go-1.21/src/crypto/tls/conn.go:587
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x633e00#011crypto/tls.(*Conn).Read+0x160#011#011#011/usr/lib/go-1.21/src/crypto/tls/conn.go:1369
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6ff0e9#011net/http.(*persistConn).Read+0x49#011#011/usr/lib/go-1.21/src/net/http/transport.go:1954
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x670ca2#011bufio.(*Reader).fill+0x102#011#011#011/usr/lib/go-1.21/src/bufio/bufio.go:113
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x670dd2#011bufio.(*Reader).Peek+0x52#011#011#011/usr/lib/go-1.21/src/bufio/bufio.go:151
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6ffed8#011net/http.(*persistConn).readLoop+0x1b8#011#011/usr/lib/go-1.21/src/net/http/transport.go:2118
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 2 @ 0x43fb8e 0x40a845 0x40a497 0x8a2f88 0x8aa1c2 0xdb105f 0xdb0b6c 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x8a2f87#011github.com/influxdata/influxdb-client-go/v2/api.(*WriteAPIImpl).WritePoint+0x127#011#011#011/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/write.go:262
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x8aa1c1#011github.com/openHPI/poseidon/pkg/monitoring.WriteInfluxPoint+0x3e1#011#011#011#011#011/opt/poseidon/pkg/monitoring/influxdb2_middleware.go:167
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xdb105e#011github.com/openHPI/poseidon/pkg/storage.(*localStorage[...]).sendMonitoringData+0x41e#011#011#011/opt/poseidon/pkg/storage/storage.go:171
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xdb0b6b#011github.com/openHPI/poseidon/pkg/storage.(*localStorage[...]).periodicallySendMonitoringData+0x8b#011/opt/poseidon/pkg/storage/storage.go:182
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 2 @ 0x43fb8e 0x438597 0x46ad25 0x518b67 0x519e5a 0x519e48 0x5ce785 0x5dfc05 0x6ff0ea 0x670ca3 0x670dd3 0x6ffed9 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x46ad24#011internal/poll.runtime_pollWait+0x84#011#011/usr/lib/go-1.21/src/runtime/netpoll.go:343
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x518b66#011internal/poll.(*pollDesc).wait+0x26#011#011/usr/lib/go-1.21/src/internal/poll/fd_poll_runtime.go:84
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x519e59#011internal/poll.(*pollDesc).waitRead+0x279#011/usr/lib/go-1.21/src/internal/poll/fd_poll_runtime.go:89
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x519e47#011internal/poll.(*FD).Read+0x267#011#011#011/usr/lib/go-1.21/src/internal/poll/fd_unix.go:164
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x5ce784#011net.(*netFD).Read+0x24#011#011#011#011/usr/lib/go-1.21/src/net/fd_posix.go:55
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x5dfc04#011net.(*conn).Read+0x44#011#011#011#011/usr/lib/go-1.21/src/net/net.go:179
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6ff0e9#011net/http.(*persistConn).Read+0x49#011#011/usr/lib/go-1.21/src/net/http/transport.go:1954
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x670ca2#011bufio.(*Reader).fill+0x102#011#011#011/usr/lib/go-1.21/src/bufio/bufio.go:113
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x670dd2#011bufio.(*Reader).Peek+0x52#011#011#011/usr/lib/go-1.21/src/bufio/bufio.go:151
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6ffed8#011net/http.(*persistConn).readLoop+0x1b8#011#011/usr/lib/go-1.21/src/net/http/transport.go:2118
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x410b09 0x46d0e9 0x96e2f3 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x46d0e8#011os/signal.signal_recv+0x28#011/usr/lib/go-1.21/src/runtime/sigqueue.go:152
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x96e2f2#011os/signal.loop+0x12#011#011/usr/lib/go-1.21/src/os/signal/signal_unix.go:23
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x434f71 0x46a95d 0x7b6971 0x7b67a5 0x7b3246 0x7c3905 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x46a95c#011runtime/pprof.runtime_goroutineProfileWithLabels+0x1c#011#011/usr/lib/go-1.21/src/runtime/mprof.go:844
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x7b6970#011runtime/pprof.writeRuntimeProfile+0xb0#011#011#011#011/usr/lib/go-1.21/src/runtime/pprof/pprof.go:734
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x7b67a4#011runtime/pprof.writeGoroutine+0x44#011#011#011#011/usr/lib/go-1.21/src/runtime/pprof/pprof.go:694
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x7b3245#011runtime/pprof.(*Profile).WriteTo+0x145#011#011#011#011/usr/lib/go-1.21/src/runtime/pprof/pprof.go:329
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x7c3904#011github.com/openHPI/poseidon/pkg/logging.debugGoroutines+0x184#011/opt/poseidon/pkg/logging/logging.go:134
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x43fb8e 0x40a845 0x40a497 0x8a28ef 0x8a2756 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x8a28ee#011github.com/influxdata/influxdb-client-go/v2/api.(*WriteAPIImpl).flushBuffer+0x8e#011/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/write.go:174
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x8a2755#011github.com/influxdata/influxdb-client-go/v2/api.(*WriteAPIImpl).bufferProc+0x235#011/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/write.go:154
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x43fb8e 0x40a845 0x40a497 0x8a2f88 0x8aa1c2 0x8a96e5 0x6e8649 0x7c3226 0x6e8649 0x798425 0xe4ac31 0x6e8649 0x6eb32e 0x6e7214 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x8a2f87#011github.com/influxdata/influxdb-client-go/v2/api.(*WriteAPIImpl).WritePoint+0x127#011/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/write.go:262
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x8aa1c1#011github.com/openHPI/poseidon/pkg/monitoring.WriteInfluxPoint+0x3e1#011#011#011/opt/poseidon/pkg/monitoring/influxdb2_middleware.go:167
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x8a96e4#011github.com/openHPI/poseidon/pkg/monitoring.InfluxDB2Middleware.func1+0x5a4#011#011/opt/poseidon/pkg/monitoring/influxdb2_middleware.go:116
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6e8648#011net/http.HandlerFunc.ServeHTTP+0x28#011#011#011#011#011#011#011/usr/lib/go-1.21/src/net/http/server.go:2136
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x7c3225#011github.com/openHPI/poseidon/pkg/logging.HTTPLoggingMiddleware.func1+0x265#011#011/opt/poseidon/pkg/logging/logging.go:95
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6e8648#011net/http.HandlerFunc.ServeHTTP+0x28#011#011#011#011#011#011#011/usr/lib/go-1.21/src/net/http/server.go:2136
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x798424#011github.com/gorilla/mux.(*Router).ServeHTTP+0x1c4#011#011#011#011#011/home/poseidon/go/pkg/mod/github.com/gorilla/mux@v1.8.1/mux.go:212
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xe4ac30#011main.initServer.(*Handler).Handle.(*Handler).handle.func1+0x2f0#011#011#011#011/home/poseidon/go/pkg/mod/github.com/getsentry/sentry-go@v0.26.0/http/sentryhttp.go:118
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6e8648#011net/http.HandlerFunc.ServeHTTP+0x28#011#011#011#011#011#011#011/usr/lib/go-1.21/src/net/http/server.go:2136
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6eb32d#011net/http.serverHandler.ServeHTTP+0x8d#011#011#011#011#011#011#011/usr/lib/go-1.21/src/net/http/server.go:2938
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6e7213#011net/http.(*conn).serve+0x5f3#011#011#011#011#011#011#011#011/usr/lib/go-1.21/src/net/http/server.go:2009
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x43fb8e 0x40b5ad 0x40b1d2 0xdabae5 0xdaad45 0xdb85c7 0xe4a579 0xe3c98e 0xdb3ee2 0xdb3a32 0xdb3e5b 0xe3c539 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xdabae4#011github.com/openHPI/poseidon/internal/nomad.receiveAndHandleNomadAPIEvents+0x44#011#011#011#011#011/opt/poseidon/internal/nomad/nomad.go:296
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xdaad44#011github.com/openHPI/poseidon/internal/nomad.(*APIClient).WatchEventStream+0x144#011#011#011#011#011/opt/poseidon/internal/nomad/nomad.go:246
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xdb85c6#011github.com/openHPI/poseidon/internal/runner.(*NomadRunnerManager).SynchronizeRunners+0x126#011#011#011/opt/poseidon/internal/runner/nomad_manager.go:110
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xe4a578#011main.synchronizeNomad.func1+0x98#011#011#011#011#011#011#011#011#011#011/opt/poseidon/cmd/poseidon/main.go:361
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xe3c98d#011github.com/openHPI/poseidon/internal/environment.(*NomadEnvironmentManager).KeepEnvironmentsSynced.func1+0xed#011/opt/poseidon/internal/environment/nomad_manager.go:183
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xdb3ee1#011github.com/openHPI/poseidon/pkg/util.RetryConstantAttemptsWithContext.retryConstant.func1+0x41#011#011#011/opt/poseidon/pkg/util/util.go:38
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xdb3a31#011github.com/openHPI/poseidon/pkg/util.retryAttempts+0xd1#011#011#011#011#011#011#011#011/opt/poseidon/pkg/util/util.go:54
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xdb3e5a#011github.com/openHPI/poseidon/pkg/util.RetryConstantAttemptsWithContext+0x5a#011#011#011#011#011/opt/poseidon/pkg/util/util.go:81
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xe3c538#011github.com/openHPI/poseidon/internal/environment.(*NomadEnvironmentManager).KeepEnvironmentsSynced+0x98#011#011/opt/poseidon/internal/environment/nomad_manager.go:175
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x43fb8e 0x438597 0x46ad25 0x518b67 0x519e5a 0x519e48 0x5ce785 0x5dfc05 0x63037b 0x4b4d58 0x63055e 0x62db30 0x633df8 0x633e01 0x6e1097 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x46ad24#011internal/poll.runtime_pollWait+0x84#011#011/usr/lib/go-1.21/src/runtime/netpoll.go:343
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x518b66#011internal/poll.(*pollDesc).wait+0x26#011#011/usr/lib/go-1.21/src/internal/poll/fd_poll_runtime.go:84
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x519e59#011internal/poll.(*pollDesc).waitRead+0x279#011/usr/lib/go-1.21/src/internal/poll/fd_poll_runtime.go:89
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x519e47#011internal/poll.(*FD).Read+0x267#011#011#011/usr/lib/go-1.21/src/internal/poll/fd_unix.go:164
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x5ce784#011net.(*netFD).Read+0x24#011#011#011#011/usr/lib/go-1.21/src/net/fd_posix.go:55
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x5dfc04#011net.(*conn).Read+0x44#011#011#011#011/usr/lib/go-1.21/src/net/net.go:179
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x63037a#011crypto/tls.(*atLeastReader).Read+0x3a#011#011/usr/lib/go-1.21/src/crypto/tls/conn.go:805
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x4b4d57#011bytes.(*Buffer).ReadFrom+0x97#011#011#011/usr/lib/go-1.21/src/bytes/buffer.go:211
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x63055d#011crypto/tls.(*Conn).readFromUntil+0xdd#011#011/usr/lib/go-1.21/src/crypto/tls/conn.go:827
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x62db2f#011crypto/tls.(*Conn).readRecordOrCCS+0x24f#011/usr/lib/go-1.21/src/crypto/tls/conn.go:625
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x633df7#011crypto/tls.(*Conn).readRecord+0x157#011#011/usr/lib/go-1.21/src/crypto/tls/conn.go:587
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x633e00#011crypto/tls.(*Conn).Read+0x160#011#011#011/usr/lib/go-1.21/src/crypto/tls/conn.go:1369
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6e1096#011net/http.(*connReader).backgroundRead+0x36#011/usr/lib/go-1.21/src/net/http/server.go:683
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x43fb8e 0x438597 0x46ad25 0x518b67 0x519e5a 0x519e48 0x5ce785 0x5dfc05 0x63037b 0x4b4d58 0x63055e 0x62db30 0x633df8 0x633e01 0x6e14cb 0x670ca3 0x670dd3 0x6e737c 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x46ad24#011internal/poll.runtime_pollWait+0x84#011#011/usr/lib/go-1.21/src/runtime/netpoll.go:343
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x518b66#011internal/poll.(*pollDesc).wait+0x26#011#011/usr/lib/go-1.21/src/internal/poll/fd_poll_runtime.go:84
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x519e59#011internal/poll.(*pollDesc).waitRead+0x279#011/usr/lib/go-1.21/src/internal/poll/fd_poll_runtime.go:89
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x519e47#011internal/poll.(*FD).Read+0x267#011#011#011/usr/lib/go-1.21/src/internal/poll/fd_unix.go:164
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x5ce784#011net.(*netFD).Read+0x24#011#011#011#011/usr/lib/go-1.21/src/net/fd_posix.go:55
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x5dfc04#011net.(*conn).Read+0x44#011#011#011#011/usr/lib/go-1.21/src/net/net.go:179
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x63037a#011crypto/tls.(*atLeastReader).Read+0x3a#011#011/usr/lib/go-1.21/src/crypto/tls/conn.go:805
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x4b4d57#011bytes.(*Buffer).ReadFrom+0x97#011#011#011/usr/lib/go-1.21/src/bytes/buffer.go:211
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x63055d#011crypto/tls.(*Conn).readFromUntil+0xdd#011#011/usr/lib/go-1.21/src/crypto/tls/conn.go:827
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x62db2f#011crypto/tls.(*Conn).readRecordOrCCS+0x24f#011/usr/lib/go-1.21/src/crypto/tls/conn.go:625
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x633df7#011crypto/tls.(*Conn).readRecord+0x157#011#011/usr/lib/go-1.21/src/crypto/tls/conn.go:587
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x633e00#011crypto/tls.(*Conn).Read+0x160#011#011#011/usr/lib/go-1.21/src/crypto/tls/conn.go:1369
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6e14ca#011net/http.(*connReader).Read+0x14a#011#011/usr/lib/go-1.21/src/net/http/server.go:791
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x670ca2#011bufio.(*Reader).fill+0x102#011#011#011/usr/lib/go-1.21/src/bufio/bufio.go:113
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x670dd2#011bufio.(*Reader).Peek+0x52#011#011#011/usr/lib/go-1.21/src/bufio/bufio.go:151
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6e737b#011net/http.(*conn).serve+0x75b#011#011#011/usr/lib/go-1.21/src/net/http/server.go:2044
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x43fb8e 0x438597 0x46ad25 0x518b67 0x519e5a 0x519e48 0x5ce785 0x5dfc05 0x63037b 0x4b4d58 0x63055e 0x62db30 0x633df8 0x633e01 0x6ff0ea 0x670ca3 0x6717a9 0x69d75c 0x69d165 0x69d4d3 0x6f431b 0x6f41f8 0x7035bf 0x670ca3 0x6713c7 0x688587 0x686a28 0x686c5b 0x68ae02 0x8de622 0x725fa8 0x725b9d
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x46ad24#011internal/poll.runtime_pollWait+0x84#011#011#011/usr/lib/go-1.21/src/runtime/netpoll.go:343
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x518b66#011internal/poll.(*pollDesc).wait+0x26#011#011#011/usr/lib/go-1.21/src/internal/poll/fd_poll_runtime.go:84
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x519e59#011internal/poll.(*pollDesc).waitRead+0x279#011#011/usr/lib/go-1.21/src/internal/poll/fd_poll_runtime.go:89
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x519e47#011internal/poll.(*FD).Read+0x267#011#011#011#011/usr/lib/go-1.21/src/internal/poll/fd_unix.go:164
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x5ce784#011net.(*netFD).Read+0x24#011#011#011#011#011/usr/lib/go-1.21/src/net/fd_posix.go:55
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x5dfc04#011net.(*conn).Read+0x44#011#011#011#011#011/usr/lib/go-1.21/src/net/net.go:179
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x63037a#011crypto/tls.(*atLeastReader).Read+0x3a#011#011#011/usr/lib/go-1.21/src/crypto/tls/conn.go:805
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x4b4d57#011bytes.(*Buffer).ReadFrom+0x97#011#011#011#011/usr/lib/go-1.21/src/bytes/buffer.go:211
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x63055d#011crypto/tls.(*Conn).readFromUntil+0xdd#011#011#011/usr/lib/go-1.21/src/crypto/tls/conn.go:827
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x62db2f#011crypto/tls.(*Conn).readRecordOrCCS+0x24f#011#011/usr/lib/go-1.21/src/crypto/tls/conn.go:625
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x633df7#011crypto/tls.(*Conn).readRecord+0x157#011#011#011/usr/lib/go-1.21/src/crypto/tls/conn.go:587
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x633e00#011crypto/tls.(*Conn).Read+0x160#011#011#011#011/usr/lib/go-1.21/src/crypto/tls/conn.go:1369
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6ff0e9#011net/http.(*persistConn).Read+0x49#011#011#011/usr/lib/go-1.21/src/net/http/transport.go:1954
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x670ca2#011bufio.(*Reader).fill+0x102#011#011#011#011/usr/lib/go-1.21/src/bufio/bufio.go:113
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6717a8#011bufio.(*Reader).ReadSlice+0x28#011#011#011#011/usr/lib/go-1.21/src/bufio/bufio.go:379
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x69d75b#011net/http/internal.readChunkLine+0x1b#011#011#011/usr/lib/go-1.21/src/net/http/internal/chunked.go:156
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x69d164#011net/http/internal.(*chunkedReader).beginChunk+0x24#011/usr/lib/go-1.21/src/net/http/internal/chunked.go:49
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x69d4d2#011net/http/internal.(*chunkedReader).Read+0x132#011#011/usr/lib/go-1.21/src/net/http/internal/chunked.go:125
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6f431a#011net/http.(*body).readLocked+0x3a#011#011#011/usr/lib/go-1.21/src/net/http/transfer.go:839
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6f41f7#011net/http.(*body).Read+0x117#011#011#011#011/usr/lib/go-1.21/src/net/http/transfer.go:831
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x7035be#011net/http.(*bodyEOFSignal).Read+0x13e#011#011#011/usr/lib/go-1.21/src/net/http/transport.go:2810
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x670ca2#011bufio.(*Reader).fill+0x102#011#011#011#011/usr/lib/go-1.21/src/bufio/bufio.go:113
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6713c6#011bufio.(*Reader).ReadByte+0x26#011#011#011#011/usr/lib/go-1.21/src/bufio/bufio.go:272
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x688586#011compress/flate.(*decompressor).moreBits+0x26#011#011/usr/lib/go-1.21/src/compress/flate/inflate.go:697
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x686a27#011compress/flate.(*decompressor).nextBlock+0x27#011#011/usr/lib/go-1.21/src/compress/flate/inflate.go:304
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x686c5a#011compress/flate.(*decompressor).Read+0x5a#011#011/usr/lib/go-1.21/src/compress/flate/inflate.go:348
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x68ae01#011compress/gzip.(*Reader).Read+0xa1#011#011#011/usr/lib/go-1.21/src/compress/gzip/gunzip.go:252
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x8de621#011github.com/hashicorp/nomad/api.(*multiCloser).Read+0x21#011/home/poseidon/go/pkg/mod/github.com/hashicorp/nomad/api@v0.0.0-20230922145839-20eadc7b296c/api.go:825
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x725fa7#011encoding/json.(*Decoder).refill+0x187#011#011#011/usr/lib/go-1.21/src/encoding/json/stream.go:165
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x725b9c#011encoding/json.(*Decoder).readValue+0x9c#011#011#011/usr/lib/go-1.21/src/encoding/json/stream.go:140
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x43fb8e 0x438597 0x46ad25 0x518b67 0x51e04c 0x51e03a 0x5d07a9 0x5e8b5e 0x5e7d10 0x669d87 0x6eb784 0x6ec0e6 0xe49365 0xe49172 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x46ad24#011internal/poll.runtime_pollWait+0x84#011#011/usr/lib/go-1.21/src/runtime/netpoll.go:343
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x518b66#011internal/poll.(*pollDesc).wait+0x26#011#011/usr/lib/go-1.21/src/internal/poll/fd_poll_runtime.go:84
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x51e04b#011internal/poll.(*pollDesc).waitRead+0x2ab#011/usr/lib/go-1.21/src/internal/poll/fd_poll_runtime.go:89
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x51e039#011internal/poll.(*FD).Accept+0x299#011#011/usr/lib/go-1.21/src/internal/poll/fd_unix.go:611
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x5d07a8#011net.(*netFD).accept+0x28#011#011#011/usr/lib/go-1.21/src/net/fd_unix.go:172
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x5e8b5d#011net.(*TCPListener).accept+0x1d#011#011#011/usr/lib/go-1.21/src/net/tcpsock_posix.go:152
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x5e7d0f#011net.(*TCPListener).Accept+0x2f#011#011#011/usr/lib/go-1.21/src/net/tcpsock.go:315
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x669d86#011crypto/tls.(*listener).Accept+0x26#011#011/usr/lib/go-1.21/src/crypto/tls/tls.go:66
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6eb783#011net/http.(*Server).Serve+0x363#011#011#011/usr/lib/go-1.21/src/net/http/server.go:3056
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6ec0e5#011net/http.(*Server).ServeTLS+0x4a5#011#011/usr/lib/go-1.21/src/net/http/server.go:3126
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xe49364#011main.serveHTTPListener+0x144#011#011#011/opt/poseidon/cmd/poseidon/main.go:222
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xe49171#011main.serveHTTPListeners.func1+0xd1#011#011/opt/poseidon/cmd/poseidon/main.go:209
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x43fb8e 0x4500c5 0x700a45 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x700a44#011net/http.(*persistConn).readLoop+0xd24#011/usr/lib/go-1.21/src/net/http/transport.go:2238
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x43fb8e 0x4500c5 0x7028d9 0x6f6c3a 0x6dfa93 0x6a0b66 0x6a03f8 0x6a26d2 0x6a1d93 0x88ee7a 0x88ea05 0x88e965 0x88e7f6 0x89618e 0x89537e 0x8a2a5f 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x7028d8#011net/http.(*persistConn).roundTrip+0x978#011#011#011#011#011#011#011#011/usr/lib/go-1.21/src/net/http/transport.go:2652
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6f6c39#011net/http.(*Transport).roundTrip+0x799#011#011#011#011#011#011#011#011/usr/lib/go-1.21/src/net/http/transport.go:604
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6dfa92#011net/http.(*Transport).RoundTrip+0x12#011#011#011#011#011#011#011#011/usr/lib/go-1.21/src/net/http/roundtrip.go:17
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6a0b65#011net/http.send+0x605#011#011#011#011#011#011#011#011#011#011/usr/lib/go-1.21/src/net/http/client.go:260
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6a03f7#011net/http.(*Client).send+0x97#011#011#011#011#011#011#011#011#011/usr/lib/go-1.21/src/net/http/client.go:181
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6a26d1#011net/http.(*Client).do+0x911#011#011#011#011#011#011#011#011#011/usr/lib/go-1.21/src/net/http/client.go:724
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6a1d92#011net/http.(*Client).Do+0x12#011#011#011#011#011#011#011#011#011/usr/lib/go-1.21/src/net/http/client.go:590
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x88ee79#011github.com/influxdata/influxdb-client-go/v2/api/http.(*service).DoHTTPRequestWithResponse+0x339#011/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/http/service.go:138
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x88ea04#011github.com/influxdata/influxdb-client-go/v2/api/http.(*service).DoHTTPRequest+0x24#011#011/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/http/service.go:110
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x88e964#011github.com/influxdata/influxdb-client-go/v2/api/http.(*service).doHTTPRequestWithURL+0xe4#011/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/http/service.go:106
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x88e7f5#011github.com/influxdata/influxdb-client-go/v2/api/http.(*service).DoPostRequest+0x55#011#011/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/http/service.go:98
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x89618d#011github.com/influxdata/influxdb-client-go/v2/internal/write.(*Service).WriteBatch+0x2cd#011#011/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/internal/write/service.go:299
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x89537d#011github.com/influxdata/influxdb-client-go/v2/internal/write.(*Service).HandleWrite+0x57d#011#011/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/internal/write/service.go:165
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x8a2a5e#011github.com/influxdata/influxdb-client-go/v2/api.(*WriteAPIImpl).writeProc+0x13e#011#011#011/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/write.go:192
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x43fb8e 0x4500c5 0x7028d9 0x6f6c3a 0x6dfa93 0x6a0b66 0x6a03f8 0x6a26d2 0x7920ab 0x792096 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x7028d8#011net/http.(*persistConn).roundTrip+0x978#011#011#011#011/usr/lib/go-1.21/src/net/http/transport.go:2652
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6f6c39#011net/http.(*Transport).roundTrip+0x799#011#011#011#011/usr/lib/go-1.21/src/net/http/transport.go:604
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6dfa92#011net/http.(*Transport).RoundTrip+0x12#011#011#011#011/usr/lib/go-1.21/src/net/http/roundtrip.go:17
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6a0b65#011net/http.send+0x605#011#011#011#011#011#011/usr/lib/go-1.21/src/net/http/client.go:260
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6a03f7#011net/http.(*Client).send+0x97#011#011#011#011#011/usr/lib/go-1.21/src/net/http/client.go:181
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6a26d1#011net/http.(*Client).do+0x911#011#011#011#011#011/usr/lib/go-1.21/src/net/http/client.go:724
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x7920aa#011net/http.(*Client).Do+0x18a#011#011#011#011#011/usr/lib/go-1.21/src/net/http/client.go:590
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x792095#011github.com/getsentry/sentry-go.(*HTTPTransport).worker+0x175#011/home/poseidon/go/pkg/mod/github.com/getsentry/sentry-go@v0.26.0/transport.go:476
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x43fb8e 0x4500c5 0x7028d9 0x6f6c3a 0x6dfa93 0x6a0b66 0x6a03f8 0x6a26d2 0xe49cef 0xe49cdd 0xe49ad2 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x7028d8#011net/http.(*persistConn).roundTrip+0x978#011/usr/lib/go-1.21/src/net/http/transport.go:2652
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6f6c39#011net/http.(*Transport).roundTrip+0x799#011/usr/lib/go-1.21/src/net/http/transport.go:604
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6dfa92#011net/http.(*Transport).RoundTrip+0x12#011/usr/lib/go-1.21/src/net/http/roundtrip.go:17
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6a0b65#011net/http.send+0x605#011#011#011/usr/lib/go-1.21/src/net/http/client.go:260
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6a03f7#011net/http.(*Client).send+0x97#011#011/usr/lib/go-1.21/src/net/http/client.go:181
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x6a26d1#011net/http.(*Client).do+0x911#011#011/usr/lib/go-1.21/src/net/http/client.go:724
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xe49cee#011net/http.(*Client).Do+0x14e#011#011/usr/lib/go-1.21/src/net/http/client.go:590
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xe49cdc#011main.notifySystemdWatchdog+0x13c#011/opt/poseidon/cmd/poseidon/main.go:294
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xe49ad1#011main.systemdWatchdogLoop+0x391#011#011/opt/poseidon/cmd/poseidon/main.go:282
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x43fb8e 0x4500c5 0x782d89 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x782d88#011github.com/getsentry/sentry-go.(*profileRecorder).run+0x188#011/home/poseidon/go/pkg/mod/github.com/getsentry/sentry-go@v0.26.0/profiler.go:159
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x43fb8e 0x4500c5 0xe48ba5 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xe48ba4#011main.watchMemoryAndAlert+0x384#011/opt/poseidon/cmd/poseidon/main.go:165
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x43fb8e 0x4500c5 0xe4aef9 0xe4b3a5 0x43f71b 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xe4aef8#011main.shutdownOnOSSignal+0x198#011/opt/poseidon/cmd/poseidon/main.go:416
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xe4b3a4#011main.main+0x2a4#011#011#011/opt/poseidon/cmd/poseidon/main.go:455
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x43f71a#011runtime.main+0x2ba#011#011/usr/lib/go-1.21/src/runtime/proc.go:267
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x43fb8e 0x4510f8 0x4510cf 0x46c985 0x47d208 0xe48fe9 0xe48c7e 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x46c984#011sync.runtime_Semacquire+0x24#011/usr/lib/go-1.21/src/runtime/sema.go:62
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x47d207#011sync.(*WaitGroup).Wait+0x47#011/usr/lib/go-1.21/src/sync/waitgroup.go:116
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xe48fe8#011main.serveHTTPListeners+0x148#011/opt/poseidon/cmd/poseidon/main.go:212
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110xe48c7d#011main.runServer+0x7d#011#011/opt/poseidon/cmd/poseidon/main.go:180
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: 1 @ 0x43fb8e 0x46da65 0x7b6e8a 0x470b01
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x46da64#011time.Sleep+0x124#011#011#011/usr/lib/go-1.21/src/runtime/time.go:195
Jan 27 17:02:09 poseidon-terraform poseidon[45103]: ##0110x7b6e89#011runtime/pprof.profileWriter+0x49#011/usr/lib/go-1.21/src/runtime/pprof/pprof.go:809
Jan 27 17:02:12 poseidon-terraform systemd[1]: poseidon.service: Watchdog timeout (limit 15s)!
Jan 27 17:02:12 poseidon-terraform systemd[1]: poseidon.service: Killing process 45103 (poseidon) with signal SIGABRT.
Jan 27 17:02:12 poseidon-terraform poseidon[45103]: time="2024-01-27T17:02:12.331706Z" level=info msg="Received SIGINT, shutting down..." package=main
Jan 27 17:02:12 poseidon-terraform poseidon[45103]: time="2024-01-27T17:02:12.331845Z" level=info msg="Server closed" error="http: Server closed" listener="0.0.0.0:7200" package=main
Jan 27 17:02:12 poseidon-terraform poseidon[45103]: time="2024-01-27T17:02:12.332051Z" level=info msg="Stopped KeepEnvironmentsSynced" error="<nil>" package=environment
Jan 27 17:02:16 poseidon-terraform poseidon[45103]: 2024/01/27 17:02:16 influxdb2client E! Write error: internal error: unexpected error writing points to database: timeout, batch kept for retrying
Jan 27 17:02:16 poseidon-terraform poseidon[45103]: time="2024-01-27T17:02:16.887057Z" level=trace msg="Retrying to write influx data..." error="internal error: unexpected error writing points to database: timeout" package=monitoring retryAttempts=0
Jan 27 17:02:16 poseidon-terraform poseidon[45103]: time="2024-01-27T17:02:16.887246Z" level=trace code=204 duration=12.191047733s method=GET path=/api/v1/health user_agent="Poseidon/ae86b1c (ON1XcxHq2oGIDd8EbVZUjVF15HhDd-ZDLExmU8IUOvU=) Go-http-client/1.1"
Jan 27 17:02:17 poseidon-terraform poseidon[45103]: time="2024-01-27T17:02:17.050967Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 27 17:02:17 poseidon-terraform poseidon[45103]: time="2024-01-27T17:02:17.071427Z" level=debug msg="Stopping CPU profiler" package=main
Jan 27 17:02:17 poseidon-terraform poseidon-exit-alert.sh[62701]: Poseidon exited unsuccessful. Sending Sentry Event
Jan 27 17:02:17 poseidon-terraform poseidon-exit-alert.sh[62702]: Event dispatched: 193c16dd-cffc-4820-bf3e-47bb784cb750
Jan 27 17:02:17 poseidon-terraform systemd[1]: poseidon.service: Failed with result 'watchdog'.
Jan 27 17:02:17 poseidon-terraform systemd[1]: poseidon.service: Consumed 2h 56min 53.039s CPU time.
Jan 27 17:02:47 poseidon-terraform systemd[1]: poseidon.service: Scheduled restart job, restart counter is at 1.
Jan 27 17:02:47 poseidon-terraform systemd[1]: Stopped Poseidon.
Jan 27 17:02:47 poseidon-terraform systemd[1]: poseidon.service: Consumed 2h 56min 53.039s CPU time.
Jan 27 17:02:47 poseidon-terraform systemd[1]: Starting Poseidon...

Since you're better at interpreting these logs, I leave that to you. Nevertheless, I'll continue merging and deploying #544, so that we can potentially observe a change.

Since I deploying the changes in #544, we did not experience another watchdog-triggered restart. Still, some of the InfluxDB data cannot be sent to our monitoring server, if I am reading the logs right.

Details

Jan 29 01:26:42 poseidon-terraform poseidon[80367]: time="2024-01-29T01:26:42.173298Z" level=trace code=204 duration="55.967µs" method=GET path=/api/v1/health user_agent="Poseidon/08c3a3d (NG5vuaT9dfhFjpSBCBmLAaL7Re_tWeQDoGk72hx_wXA=) Go-http-client/1.1"
Jan 29 01:26:42 poseidon-terraform poseidon[80367]: time="2024-01-29T01:26:42.173971Z" level=trace msg="Starting timeout for debugging the Goroutines"
Jan 29 01:26:42 poseidon-terraform poseidon[80367]: time="2024-01-29T01:26:42.174196Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 29 01:26:45 poseidon-terraform poseidon[80367]: time="2024-01-29T01:26:45.234427Z" level=debug code=204 duration="69.413µs" method=GET path=/api/v1/health user_agent="Faraday v2.9.0"
Jan 29 01:26:45 poseidon-terraform poseidon[80367]: time="2024-01-29T01:26:45.234479Z" level=trace msg="Starting timeout for debugging the Goroutines"
Jan 29 01:26:49 poseidon-terraform poseidon[80367]: 2024/01/29 01:26:49 influxdb2client E! Write error: internal error: unexpected error writing points to database: timeout, batch kept for retrying
Jan 29 01:26:49 poseidon-terraform poseidon[80367]: time="2024-01-29T01:26:49.581341Z" level=trace msg="Retrying to write influx data..." error="internal error: unexpected error writing points to database: timeout" package=monitoring retryAttempts=0
Jan 29 01:26:49 poseidon-terraform poseidon[80367]: time="2024-01-29T01:26:49.674934Z" level=trace code=204 duration="56.598µs" method=GET path=/api/v1/health user_agent="Poseidon/08c3a3d (NG5vuaT9dfhFjpSBCBmLAaL7Re_tWeQDoGk72hx_wXA=) Go-http-client/1.1"
Jan 29 01:26:49 poseidon-terraform poseidon[80367]: time="2024-01-29T01:26:49.674986Z" level=trace msg="Starting timeout for debugging the Goroutines"
Jan 29 01:26:49 poseidon-terraform poseidon[80367]: time="2024-01-29T01:26:49.675930Z" level=trace msg="Notified Systemd Watchdog" package=main
Jan 29 01:26:51 poseidon-terraform poseidon[80367]: time="2024-01-29T01:26:51.699057Z" level=trace msg="Current Memory Usage" heap=39787152 package=main
Jan 29 01:26:57 poseidon-terraform poseidon[80367]: time="2024-01-29T01:26:57.177511Z" level=trace code=204 duration="73.58µs" method=GET path=/api/v1/health user_agent="Poseidon/08c3a3d (NG5vuaT9dfhFjpSBCBmLAaL7Re_tWeQDoGk72hx_wXA=) Go-http-client/1.1"

Another issue occurred, this time triggering the Health route latency debugging.

Great! This occurrence confirms our assumption.

Details

The stuck watchdog request.

1 @ 0x43fb8e 0x4500c5 0x7028d9 0x6f6c3a 0x6dfa93 0x6a0b66 0x6a03f8 0x6a26d2 0xe49cef 0xe49cdd 0xe49ad2 0x470b01
#	0x7028d8	net/http.(*persistConn).roundTrip+0x978	/usr/lib/go-1.21/src/net/http/transport.go:2652
#	0x6f6c39	net/http.(*Transport).roundTrip+0x799	/usr/lib/go-1.21/src/net/http/transport.go:604
#	0x6dfa92	net/http.(*Transport).RoundTrip+0x12	/usr/lib/go-1.21/src/net/http/roundtrip.go:17
#	0x6a0b65	net/http.send+0x605			            /usr/lib/go-1.21/src/net/http/client.go:260
#	0x6a03f7	net/http.(*Client).send+0x97	    	/usr/lib/go-1.21/src/net/http/client.go:181
#	0x6a26d1	net/http.(*Client).do+0x911		        /usr/lib/go-1.21/src/net/http/client.go:724
#	0xe49cee	net/http.(*Client).Do+0x14e	        	/usr/lib/go-1.21/src/net/http/client.go:590
#	0xe49cdc	main.notifySystemdWatchdog+0x13c	    /opt/poseidon/cmd/poseidon/main.go:294
#	0xe49ad1	main.systemdWatchdogLoop+0x391	    	/opt/poseidon/cmd/poseidon/main.go:282

The InfluxDB2Middleware is stuck at writing the Influx Data Point (See "1. We write our first InfluxDB data point. The library uses a channel to send the point to a buffer processor [1].")

1 @ 0x43fb8e 0x40a845 0x40a497 0x8a2f88 0x8aa1c2 0x8a96e5 0x6e8649 0x7c3226 0x6e8649 0x798425 0xe4ac31 0x6e8649 0x6eb32e 0x6e7214 0x470b01
#	0x8a2f87	github.com/influxdata/influxdb-client-go/v2/api.(*WriteAPIImpl).WritePoint+0x127	/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/write.go:262
#	0x8aa1c1	github.com/openHPI/poseidon/pkg/monitoring.WriteInfluxPoint+0x3e1			        /opt/poseidon/pkg/monitoring/influxdb2_middleware.go:167
#	0x8a96e4	github.com/openHPI/poseidon/pkg/monitoring.InfluxDB2Middleware.func1+0x5a4		    /opt/poseidon/pkg/monitoring/influxdb2_middleware.go:116
#	0x6e8648	net/http.HandlerFunc.ServeHTTP+0x28						                        	/usr/lib/go-1.21/src/net/http/server.go:2136
#	0x7c3225	github.com/openHPI/poseidon/pkg/logging.HTTPLoggingMiddleware.func1+0x265		    /opt/poseidon/pkg/logging/logging.go:95
#	0x6e8648	net/http.HandlerFunc.ServeHTTP+0x28						                        	/usr/lib/go-1.21/src/net/http/server.go:2136
#	0x798424	github.com/gorilla/mux.(*Router).ServeHTTP+0x1c4				                	/home/poseidon/go/pkg/mod/github.com/gorilla/mux@v1.8.1/mux.go:212
#	0xe4ac30	main.initServer.(*Handler).Handle.(*Handler).handle.func1+0x2f0		        		/home/poseidon/go/pkg/mod/github.com/getsentry/sentry-go@v0.26.0/http/sentryhttp.go:118
#	0x6e8648	net/http.HandlerFunc.ServeHTTP+0x28					                        		/usr/lib/go-1.21/src/net/http/server.go:2136
#	0x6eb32d	net/http.serverHandler.ServeHTTP+0x8d				                    			/usr/lib/go-1.21/src/net/http/server.go:2938
#	0x6e7213	net/http.(*conn).serve+0x5f3						                        		/usr/lib/go-1.21/src/net/http/server.go:2009

Not only is the Middleware stuck, but there are also two regular updates of some storages stuck.

2 @ 0x43fb8e 0x40a845 0x40a497 0x8a2f88 0x8aa1c2 0xdb105f 0xdb0b6c 0x470b01
#	0x8a2f87	github.com/influxdata/influxdb-client-go/v2/api.(*WriteAPIImpl).WritePoint+0x127			        /home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/write.go:262
#	0x8aa1c1	github.com/openHPI/poseidon/pkg/monitoring.WriteInfluxPoint+0x3e1					                /opt/poseidon/pkg/monitoring/influxdb2_middleware.go:167
#	0xdb105e	github.com/openHPI/poseidon/pkg/storage.(*localStorage[...]).sendMonitoringData+0x41e		        /opt/poseidon/pkg/storage/storage.go:171
#	0xdb0b6b	github.com/openHPI/poseidon/pkg/storage.(*localStorage[...]).periodicallySendMonitoringData+0x8b	/opt/poseidon/pkg/storage/storage.go:182

As described in Point 3, the buffer processor stuck flushing its buffer.

1 @ 0x43fb8e 0x40a845 0x40a497 0x8a28ef 0x8a2756 0x470b01
#	0x8a28ee	github.com/influxdata/influxdb-client-go/v2/api.(*WriteAPIImpl).flushBuffer+0x8e	/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/write.go:174
#	0x8a2755	github.com/influxdata/influxdb-client-go/v2/api.(*WriteAPIImpl).bufferProc+0x235	/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/write.go:154

The write processor is stuck waiting for an HTTP request to our InfluxDB server. Therefore, the issue might either be caused by our InfluxDB server or by the network in between.

1 @ 0x43fb8e 0x4500c5 0x7028d9 0x6f6c3a 0x6dfa93 0x6a0b66 0x6a03f8 0x6a26d2 0x6a1d93 0x88ee7a 0x88ea05 0x88e965 0x88e7f6 0x89618e 0x89537e 0x8a2a5f 0x470b01
#	0x7028d8	net/http.(*persistConn).roundTrip+0x978								/usr/lib/go-1.21/src/net/http/transport.go:2652
#	0x6f6c39	net/http.(*Transport).roundTrip+0x799								/usr/lib/go-1.21/src/net/http/transport.go:604
#	0x6dfa92	net/http.(*Transport).RoundTrip+0x12								/usr/lib/go-1.21/src/net/http/roundtrip.go:17
#	0x6a0b65	net/http.send+0x605										/usr/lib/go-1.21/src/net/http/client.go:260
#	0x6a03f7	net/http.(*Client).send+0x97									/usr/lib/go-1.21/src/net/http/client.go:181
#	0x6a26d1	net/http.(*Client).do+0x911									/usr/lib/go-1.21/src/net/http/client.go:724
#	0x6a1d92	net/http.(*Client).Do+0x12									/usr/lib/go-1.21/src/net/http/client.go:590
#	0x88ee79	github.com/influxdata/influxdb-client-go/v2/api/http.(*service).DoHTTPRequestWithResponse+0x339	/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/http/service.go:138
#	0x88ea04	github.com/influxdata/influxdb-client-go/v2/api/http.(*service).DoHTTPRequest+0x24		/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/http/service.go:110
#	0x88e964	github.com/influxdata/influxdb-client-go/v2/api/http.(*service).doHTTPRequestWithURL+0xe4	/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/http/service.go:106
#	0x88e7f5	github.com/influxdata/influxdb-client-go/v2/api/http.(*service).DoPostRequest+0x55		/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/http/service.go:98
#	0x89618d	github.com/influxdata/influxdb-client-go/v2/internal/write.(*Service).WriteBatch+0x2cd		/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/internal/write/service.go:299
#	0x89537d	github.com/influxdata/influxdb-client-go/v2/internal/write.(*Service).HandleWrite+0x57d		/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/internal/write/service.go:165
#	0x8a2a5e	github.com/influxdata/influxdb-client-go/v2/api.(*WriteAPIImpl).writeProc+0x13e			/home/poseidon/go/pkg/mod/github.com/influxdata/influxdb-client-go/v2@v2.13.0/api/write.go:192

Still, some of the InfluxDB data cannot be sent to our monitoring server

I agree. Maybe it's the network or maybe it's our InfluxDB server. Next to issues for Points 2 and 3, let's create an issue checking for warnings in our InfluxDB server logs to rule out the possibility of our InfluxDB server having (overload) issues?!

Great! This occurrence confirms our assumption.

Awesome, and nice findings with a great explanation 👍👍

Next to issues for Points 2 and 3, [...]

Yes, let's create these and close this (current) issue.

an issue checking for warnings in our InfluxDB server logs to rule out the possibility of our InfluxDB server having (overload) issues?!

Actually, I saw this partially of issue 3, but maybe I was combining too much. My idea was that we can only improve the reliability, if we know what the current issue is. However, I am also fine with a dedicated one ...


From my point of view, we can also merge #545, but I'll leave that for you.