Gracefully handle rustaceans.org going down
pietroalbini opened this issue · 8 comments
highfive is not applying labels to rust-lang/rust anymore, since a few days ago.
rust-lang/rust#49518 might be the last time it happened. That was six days ago.
add_labels
had a modification merged seventeen days ago in #112.
is_new_contributor
had modifications merged ten days ago in #119.
I'm not sure when code is being deployed, but I haven't found examples of Highfive posting comments on new PRs. rust-lang/rust#49633 should have had a comment posted on it, I think.
@nrc: Does the production Highfive produce logs that you can copy in here?
I set up a dev instance of Highfive. When handling PR creation, this happens:
Traceback (most recent call last):
File "/Users/davidalber/dev/highfive/highfive/newpr.py", line 426, in <module>
new_pr(payload, user, token)
File "/Users/davidalber/dev/highfive/highfive/newpr.py", line 370, in new_pr
set_assignee(reviewer, owner, repo, issue, user, token, author, to_mention)
File "/Users/davidalber/dev/highfive/highfive/newpr.py", line 112, in set_assignee
irc_name_of_reviewer = get_irc_nick(assignee)
File "/Users/davidalber/dev/highfive/highfive/newpr.py", line 324, in get_irc_nick
data = urllib2.urlopen(rustaceans_api_url.format(username=gh_name))
File "/usr/local/Cellar/python/2.7.14/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/Cellar/python/2.7.14/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 435, in open
response = meth(req, response)
File "/usr/local/Cellar/python/2.7.14/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/local/Cellar/python/2.7.14/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 467, in error
result = self._call_chain(*args)
File "/usr/local/Cellar/python/2.7.14/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/local/Cellar/python/2.7.14/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 654, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/local/Cellar/python/2.7.14/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 435, in open
response = meth(req, response)
File "/usr/local/Cellar/python/2.7.14/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/local/Cellar/python/2.7.14/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 473, in error
return self._call_chain(*args)
File "/usr/local/Cellar/python/2.7.14/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/local/Cellar/python/2.7.14/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 556, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 503: Service Unavailable
The http://www.ncameron.org/rustaceans/ service used by get_irc_nick
is down and Highfive is not currently resilient in the face of outages of that service (e.g., try https://www.ncameron.org/rustaceans/user?username=nrc). The call to get_irc_nick
happens immediately after the GitHub API call to set a user.
Thanks for investigating this! For now we can probably avoid sending IRC pings if the service is unavailable, but I'm not so sure depending on an external service just to get the IRC nickname is a good idea. Maybe in the future we can work on a local copy of that repo (updated every few hours)?
rustaceans is back online, so this should be fixed. However, it would be much better if highfive did not just give up if rustaceans is down
Indeed it is working again. Highfive applied the expected label in rust-lang/rust#49718.
@nrc: Shall we get a Pingdom alert or something similar on rustaceans?
I think if we can handle it here, then there is no need - it's not critical infrastructure, it goes down rarely, and when it does I hear about it pretty quickly (the bottleneck to get it back up is usually me having time to fix it, rather than not knowing it is down).