shotover/shotover-proxy

Intermittent windsock flamegraph CI failure

Closed this issue · 2 comments

rukai commented

For some reason our windsock benches are being killed by the github runner when running the flamegraph profiler image

From researching the error Process completed with exit code 143 all I can tell is that our VM was killed because it displeased the github actions runner.
This could be due to excessive memory usage, cpu usage or something else entirely.
Although I've seen cpu usage mentioned that doesnt make any sense because the hypervisor should be able to just limit that as it pleases without killing us.
I'm also going to rule out disk usage as locally that bench only writes 9MB worth of perf data to the disk.
Memory usage also doesnt seem to be the issue, locally it only goes up by 2GB not enough to reach the 7GB limit of GA VMs.

So now im really lost.
I do know that we have a workaround in place in CI to make this bench complete in a reasonable amount of time instead of 5h, I think due to perf being an older version?

I have not seen this since moving the majority of the windsock benches into the integration test workflows.
Possibly that GA instance was just getting overwhelmed at that point and now that that workflow has less in it, its not reaching breaking point.

I'll close this issue as it no longer occurs.