pathfinder-for-autonomous-navigation/FlightSoftware

CI Timing Out Due to Persistant Flight Software Binary Threads

Closed this issue · 6 comments

TL;DR; PTest/thread handling in flight software sucks and isn't shutting down the flight software binary processes properly. Running all the mission checkouts locally on my desktop eventually pinned all 24 cores and grabbed 32 gigs of RAM plus 38 of swap crashing other processes running in the background -- like my chrome tabs.

Upgrade PTest to properly shut things down.

So what I think something like this is happening once I added changes to handle signals properly:

  1. Upon PTest termination we send SIGTERM to the main flight software process thread.
  2. The main thread then tells the reader thread in debug_console to stop and blocks until that thread exits.
  3. The reader thread however, is blocked waiting for another line of input and therefore never exits.
  4. This means the main thread doesn't exit, and we just get stuck with extra processes left around the system.
  5. Not really sure why this grabs more and more memory and pins the CPU however.

Hmmm... so perhaps this is more of a problem on Linux than mac? I don't see the phantom flight software processes on MacOS -- just the downlink parser.

Screen Shot 2021-08-29 at 1 06 58 PM

Merging this should hopefully unblock other CI blocked PR's

Added this BLAS limiter to pass CI, unsure if its spurred by this PR but it should be fine to add:
https://stackoverflow.com/questions/52026652/openblas-blas-thread-init-pthread-create-resource-temporarily-unavailable

Closing per #801