xmlrpc.XmlRpcNode consumes 100%+ cpu after being shutdown
drjsmith opened this issue ยท 7 comments
TLDR: calling shutdown()
on a XmlRpcNode
does not fully shut it down and leaves a thread running.
I am currently on Noetic/Ubuntu 20.04/python 3.8.
What is going on:
- Calling
start()
results in a new thread runningself.server.serve_forever()
, whereserver
is of typeThreadingXMLRPCServer
which has a base class ofsocketserver.BaseServer
. - That function, defined for
BaseServer
insocketserver.py:215
, will not exit until the loop's condition is satisfied:while not self.__shutdown_request:
self.__shutdown_request
can be set by callingshutdown()
onBaseServer
- However,
XmlRpcNode
'sshutdown()
method never callsself.server.shutdown()
, onlyserver.socket.close()
andserver.server_close()
. I'm pretty sure those last 2 functions do the same thing, since:
class TCPServer(BaseServer):
...
def server_close(self):
self.socket.close()
The thread is then stuck in an infinite loop. Before shutting down XmlRpcNode
, the thread not use a significant amount of cpu; afterwards, however, the thread consumes 100% of a single core (as shown by top
).
It seems to me that the solution may be as simple as adding self.server.shutdown()
to XmlRpcNode
's shutdown()
function, though I'm not certain whether it should be before or after closing the socket or if there are other considerations.
Why this matters to me:
I'm guessing this isn't a common problem, since generally the process would be ending shortly afterwards. However, I am using roslaunch.parent.ROSLaunchParent
to launch and shutdown nodes as part of a python-based robot navigation benchmarking codebase. I wrote the code and used it extensively on Kinetic/Ubuntu 16.04/python 2.7 and did not encounter this issue at that time: starting and stopping hundreds of instances of roslaunch.parent.ROSLaunchParent
did not cause any noticeable issues. As soon as I switched to Noetic/Ubuntu 20.04/python 3.8, I noticed that after running a few experiments the python process was at times using ~166% cpu. Usage hits 100% if there is only 1 'stopped' instance; ~166% if there are 2 or more.
The cpu usage of the process drops very low as soon as I start a new instance; I'm not familiar enough with the Selector
-related code to understand why this is so.
Minimum example:
The script starts a roscore and then repeatedly creates, starts, and stops roslaunch.parent.ROSLaunchParent
instances; run top
to see how the process' cpu usage changes.
import os
import time
import roslaunch
class RoslaunchTest(object):
def __init__(self, is_core=True):
os.environ["ROS_MASTER_URI"]="http://localhost:11312"
self.is_core = is_core
def start(self):
print("Starting...")
uuid = roslaunch.rlutil.get_or_generate_uuid(None, not self.is_core)
self.roslaunch_object = roslaunch.parent.ROSLaunchParent(
run_id=uuid, roslaunch_files=[],
is_core=self.is_core, port=11312
)
self.roslaunch_object.start()
print("Started!")
def stop(self):
print("Stopping...")
self.roslaunch_object.shutdown()
print("Stopped!")
def demo():
core = RoslaunchTest(is_core=True)
core.start()
test = RoslaunchTest(is_core=False)
def cycle():
test.start()
time.sleep(5)
test.stop()
time.sleep(10)
for _ in range(4):
cycle()
if __name__ == '__main__':
demo()
I've noticed the same issues on the ROS Melodic/Ubuntu 18 but only with the Python 3.6.9. On the other hand this is not the case with the Python 2.7.
Same for me. Unusable at the moment. We want to switch between two driving modes with different set of nodes. we want to avoid running all at the time when it is not necessary. for that we have a launch_handler with service which should start the corresponding launch file. when killing one each time a new instance of my handler appears in my process list, which generates 100% cpu load. for each on/off it generates a new one.
@TobiMiller @fgrcar Here is how I worked around the problem:
#Class that fixes the problem
class RoslaunchShutdownWrapper(roslaunch.parent.ROSLaunchParent):
def shutdown(self):
server = self.server.server if self.server is not None and self.server.server is not None else None
super(RoslaunchShutdownWrapper, self).shutdown()
if server:
server.shutdown()
def __del__(self):
print("Shutting down now!!!!!!")
self.shutdown()
#How to use it:
roslaunch_object = RoslaunchShutdownWrapper(
run_id=..., roslaunch_files=...,
is_core=..., port=...,
sigterm_timeout=...
)
roslaunch_object.start()
Let me know if it works for you.
@drjsmith Thanks so much for posting what helped for you !
I'm surprised this hasn't been addressed sooner, it's really a pain. I made a QT user Interface using the roslaunch API basically to launch and shutdown nodes of our stack. But I'm seeing the same CPU error you mentioned as I shutdown the nodes, which makes our Qt interface extremely slow and impossible to use as some nodes are being shut.
Your proposition seems to work for me so far though! Based on it, I wrote a quick class that the Qt interface calls to launch and shutdown nodes :
class RoslaunchWrapperObject(roslaunch.parent.ROSLaunchParent):
def start(self):
super(RoslaunchWrapperObject, self).start()
print(self.server)
def stop(self):
print("Stopping...")
print(self.server)
server = self.server.server if self.server is not None and self.server.server is not None else None
super(RoslaunchWrapperObject, self).shutdown()
if server:
server.shutdown()
To launch a node from the QT :
def launch(parameter):
uuid = roslaunch.rlutil.get_or_generate_uuid(None, False)
roslaunch.configure_logging(uuid)
cli_args = parameter
roslaunch_args = cli_args[1:]
roslaunch_file = [(roslaunch.rlutil.resolve_launch_arguments(cli_args)[0], roslaunch_args)]
launcher = RoslaunchWrapperObject(run_id = uuid, roslaunch_files = roslaunch_file)
return launcher
and the calls are made like this :
cli_args = ['PATH/online.launch','simulation:=true']
self.launch = launch(cli_args)
self.launch.start() # to start it
self.launch.stop() # to stop it
Thank you so much for this, it really helped ! I do hope they address the issue and find a proper way to fix it so we can reuse the API just like on previous ROS versions.
@RodolpheCyber You're welcome!
is this issue still present on melodic?
@adipotnis I discovered the problem when upgrading from kinetic to noetic, so I haven't personally tested it on melodic. @fgrcar reported encountering the problem on melodic but only when using python 3. I wouldn't be surprised if the root bug has been present for many versions and has only been exposed due to the transition to python 3, though I haven't verified this.