Fix flakyness
joshuamorton opened this issue · 4 comments
Despite prior work, still some flakyness. Source is
dist-packages/rosmaster/master_api.py:208: xmlrpcapi(api).publisherUpdate('/master', topic, pub_uris)
. pub_uris is an empty list. This publisherUpdate method is an autogenerated method from an xml rpc api somewhere else in ROS, so debugging this is mildly annoying. See this error message:
[rosmaster.threadpool][ERROR] 2017-05-14 18:56:01,541: Traceback (most recent call last):
File "/opt/ros/indigo/lib/python2.7/dist-packages/rosmaster/threadpool.py", line 218, in run
result = cmd(*args)
File "/opt/ros/indigo/lib/python2.7/dist-packages/rosmaster/master_api.py", line 208, in publisher_update_task
xmlrpcapi(api).publisherUpdate('/master', topic, pub_uris)
File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
verbose=self.__verbose
File "/usr/lib/python2.7/xmlrpclib.py", line 1273, in request
return self.single_request(host, handler, request_body, verbose)
File "/usr/lib/python2.7/xmlrpclib.py", line 1301, in single_request
self.send_content(h, request_body)
File "/usr/lib/python2.7/xmlrpclib.py", line 1448, in send_content
connection.endheaders(request_body)
File "/usr/lib/python2.7/httplib.py", line 975, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 835, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 797, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 778, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 571, in create_connection
raise err
I believe what's happening is that add_one is publishing an updated value and the nodes that subscribe to it don't exist/have unregistered themselves for some reason.
Upon further investigation, that's exactly what's happened:
...
[rosmaster.master][INFO] 2017-05-14 18:55:50,277: +SUB [/subber] /pyrostest/add_one_py_Yoga_Josh_Ub_10375_1868351629294374131 http://Yoga-Josh-Ub:57270/
...
~ 10 seconds
...
[rosmaster.master][INFO] 2017-05-14 18:56:01,478: -SUB [/subber] /pyrostest/add_one_py_Yoga_Josh_Ub_10375_1868351629294374131 http://Yoga-Josh-Ub:57270/
...
[rosmaster.master][INFO] 2017-05-14 18:56:01,540: publisherUpdate[/pyrostest/add_one] -> http://Yoga-Josh-Ub:41422/ : []
ERROR
Except its not actually a timeout issue, its a blocking issue, as arbitrarily increasing the timeout doesn't fix the problem.
For reference, I found where to look for the issue by hard cancelling after 10 seconds of waiting and using the output of the error message (like what lines it was on) to figure out where python was stalling. Hacky but functional.
see #18