m-lab/traceroute-caller

Open file descriptor leak

Closed this issue · 3 comments

Looking at prometheus metrics for process_open_fds for traceroute that there is a fd leak. See image below:

traceroute-fdleak

After a rollout on the 16th, the fd count has steadily increased until machine reboots.

Only lga0* nodes are shown for convenience. The pattern is global. Originally started on ~Nov 8th in staging and Nov 14th in production.

sum by(machine, container, deployment) (process_open_fds{machine=~".*", container=~"traceroute"})

This is not yet resolved. The first two bumps are the load tests using v0.3.3 -- the steady growth without synthetic traffic correlates with process count increase and goroutine counts.

Screen Shot 2019-12-05 at 8 26 46 AM

Well, dang.