OctoPrint/OctoPrint

Busy loop in GCODE analysis thread causes server to get blocked

Closed this issue · 4 comments

Problem

As found thanks to this forum thread, OctoPrint's GCODE analysis currently blocks the whole server, which is especially an issue when analysing a whole bunch of files on server startup, and less so an issue when processing a freshly uploaded single file.

The culprit seems to be this:

try:
# let's wait for stuff to finish
while p.returncode is None:
if self._aborted:
# oh, we shall abort, let's do so!
p.commands[0].terminate()
raise AnalysisAborted(reenqueue=self._reenqueue)
# else continue
p.commands[0].poll()
finally:
p.close()

Using poll here returns immediately, other than obviously understood when implementing things as they are it is not blocking. This leads to the whole thing being a busy loop with no sleep, meaning the CPU is completely taken up by this single thread and there's next to no chance for the system to schedule other threads.

Solution

This loop needs to be made non busy. One options is to instead of poll simply use wait, but that has the disadvantage that we won't be able to quickly abort an analysis in the middle of it, e.g. on start of a print or when the file gets deleted.

Any further points

Similar code snippets are also used in the implementation of octoprint.util.commandline.CommandlineCaller and the by now hopefully pretty much unused update-octoprint.py helper script of the software updater.

The former sees use in installing and updating plugins and such, and therefore needs to be investigated. The issue with using wait here is the need to report back any generated stdout or stderr lines to the user in intervals. The issue has possibly been made less problematic here by the associated readlines calls already containing a half second timeout each.

This issue has been mentioned on OctoPrint Community Forum. There might be relevant details there:

https://community.octoprint.org/t/python-looping-after-installing-preheat-button/57933/7

Fixed by the above commit, and I also took care of a refactoring to make CommandlineCaller thing clearer. The update-script turned out to not be affected either thanks to blocking I/O inside, it really looks like it was only the gcode analysis queue doing a busy loop.

Currently this is only available on maintenance, but the goal is to backport the two commits to 1.10.1 once that's going out.

Backported to 1.10.1.