gtagency/pyrostest

Fix flakyness

joshuamorton opened this issue · 4 comments

Despite prior work, still some flakyness. Source is

dist-packages/rosmaster/master_api.py:208: xmlrpcapi(api).publisherUpdate('/master', topic, pub_uris). pub_uris is an empty list. This publisherUpdate method is an autogenerated method from an xml rpc api somewhere else in ROS, so debugging this is mildly annoying. See this error message:

[rosmaster.threadpool][ERROR] 2017-05-14 18:56:01,541: Traceback (most recent call last):
  File "/opt/ros/indigo/lib/python2.7/dist-packages/rosmaster/threadpool.py", line 218, in run
    result = cmd(*args)
  File "/opt/ros/indigo/lib/python2.7/dist-packages/rosmaster/master_api.py", line 208, in publisher_update_task
    xmlrpcapi(api).publisherUpdate('/master', topic, pub_uris)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/python2.7/xmlrpclib.py", line 1273, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1301, in single_request
    self.send_content(h, request_body)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1448, in send_content
    connection.endheaders(request_body)
  File "/usr/lib/python2.7/httplib.py", line 975, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 835, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 797, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 778, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err

I believe what's happening is that add_one is publishing an updated value and the nodes that subscribe to it don't exist/have unregistered themselves for some reason.

Upon further investigation, that's exactly what's happened:

...
[rosmaster.master][INFO] 2017-05-14 18:55:50,277: +SUB [/subber] /pyrostest/add_one_py_Yoga_Josh_Ub_10375_1868351629294374131 http://Yoga-Josh-Ub:57270/
...
~ 10 seconds
...
[rosmaster.master][INFO] 2017-05-14 18:56:01,478: -SUB [/subber] /pyrostest/add_one_py_Yoga_Josh_Ub_10375_1868351629294374131 http://Yoga-Josh-Ub:57270/
...
[rosmaster.master][INFO] 2017-05-14 18:56:01,540: publisherUpdate[/pyrostest/add_one] -> http://Yoga-Josh-Ub:41422/ : []
ERROR

Except its not actually a timeout issue, its a blocking issue, as arbitrarily increasing the timeout doesn't fix the problem.

For reference, I found where to look for the issue by hard cancelling after 10 seconds of waiting and using the output of the error message (like what lines it was on) to figure out where python was stalling. Hacky but functional.