m-lab/traceroute-caller

scamper pid leak

Opened this issue · 3 comments

There was a pid leak detected by mlab3.fln01

https://github.com/m-lab/ops-tracker/issues/1204

After investigation of the log, the pid leak was caused by scamper after scamper-daeon failed:

Screen Shot 2020-11-02 at 3 08 51 PM

You can see the green line (scamper daemon died), then scamper was brought up, the pid leak started.

After about 11 hours, the pid leak caused the crash of the evrything.

Before we nail down the pid leak in scamper, we will replace the flag

"scamper-daemon-with-scamper-backup"

with

"scamper-daemon"

k8s-support PR following.

The PID count over the same time period as "scamper" was active in the image above.

Screen Shot 2020-11-02 at 4 29 15 PM