mozilla-releng/balrog

Recent balrogscript error related to old_data_versions

gbrownmozilla opened this issue · 3 comments

https://treeherder.mozilla.org/jobs?repo=try&revision=cc3b4c172fe9edd7eb04567a3c8aa3a00c0653a7
https://firefox-ci-tc.services.mozilla.com/tasks/GXrfm60gSyGhrcFNnuPzbQ/runs/0/logs/public/logs/live_backing.log

2022-06-07 14:42:52,783 - balrogclient.api - DEBUG - Data sent: {'blob': {'platforms': {'Linux_x86_64-gcc3': {'locales': {'en-US': {'buildID': '20220605185654', 'appVersion': '102.0', 'displayVersion': '102.0 Beta 4', ...}]}}}}}, 'old_data_versions': {'platforms': {'Linux_x86_64-gcc3': {'locales': {}}}}}
2022-06-07 14:42:52,951 - urllib3.connectionpool - DEBUG - https://admin-stage.balrog.nonprod.cloudops.mozgcp.net:443 "POST /api/v2/releases/Firefox-102.0b4-build1 HTTP/1.1" 500 107
2022-06-07 14:42:52,951 - balrogclient.api - ERROR - Caught HTTPError: b'{"type": "about:blank", "title": "Internal Server Error", "detail": "Internal Server Error", "status": 500}'
2022-06-07 14:42:52,952 - balrogclient.api - DEBUG - REQUEST STATS: {"timestamp": 1654612972.9521196, "method": "POST", "url": "https://admin-stage.balrog.nonprod.cloudops.mozgcp.net/api/v2/releases/Firefox-102.0b4-build1", "status_code": 500, "elapsed_secs": 0.16682}
2022-06-07 14:42:52,952 - redo - DEBUG - retry: Caught exception: 
Traceback (most recent call last):
  File "/app/lib/python3.9/site-packages/redo/__init__.py", line 170, in retry
    return action(*args, **kwargs)
  File "/app/lib/python3.9/site-packages/balrogscript/script.py", line 110, in <lambda>
    retry(lambda: submitter.run(**release), jitter=5, sleeptime=10, max_sleeptime=30, attempts=10)
  File "/app/lib/python3.9/site-packages/balrogscript/submitter/cli.py", line 482, in run
    balrog_request(session, "post", url, json=data)
  File "/app/lib/python3.9/site-packages/balrogclient/api.py", line 90, in balrog_request
    resp.raise_for_status()
  File "/app/lib/python3.9/site-packages/requests/models.py", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: INTERNAL SERVER ERROR for url: https://admin-stage.balrog.nonprod.cloudops.mozgcp.net/api/v2/releases/Firefox-102.0b4-build1

From sentry:
https://sentry.io/organizations/mozilla/issues/3122737435/?project=6262499&statsPeriod=24h
TypeError /v2.auslib_web_admin_views_releases_v2_update_release
unsupported operand type(s) for +: 'NoneType' and 'int'

TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'
  File "flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "connexion/decorators/decorator.py", line 68, in wrapper
    response = function(request)
  File "connexion/decorators/uri_parsing.py", line 149, in wrapper
    response = function(request)
  File "connexion/decorators/validation.py", line 193, in wrapper
    response = function(request)
  File "connexion/decorators/validation.py", line 396, in wrapper
    return function(request)
  File "connexion/decorators/response.py", line 112, in wrapper
    response = function(request)
  File "connexion/decorators/parameter.py", line 116, in wrapper
    return function(**kwargs)
  File "auslib/web/admin/views/releases_v2.py", line 39, in update_release
    new_data_versions = releases.update_release(name, body["blob"], body["old_data_versions"], body.get("when"), request.username, request.transaction)
  File "auslib/services/releases.py", line 517, in update_release
    set_by_path(new_data_versions, path, old_data_version + 1)

We have been hitting this TypeError occasionally for at least several months on staging, but I don't see it at all in production.

I think this means

old_data_version = get_by_path(old_data_versions, path)

has returned None.

Today there were some failures in Balrog submission-of-updates tasks in a staging release
https://treeherder.mozilla.org/jobs?repo=try&group_state=expanded&selectedTaskRun=G35nLntMQvuFoCaKXaHQpw.0&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel&revision=53c5d2ecac3074646b91bee763382aaf83ef0a87
that seemed correlated with this old_data_version error in sentry. In this case, somehow two promote_devedition tasks had been spawned just a second apart, resulting in duplicate Balrog submission tasks; for each of those, the first task succeeded, the next failed (task descriptions seemed pretty much identical, and the submitted old_data_version seemed identical).

2022-07-05 18:26:36,308 - balrogclient.api - DEBUG - Balrog request to https://admin-stage.balrog.nonprod.cloudops.mozgcp.net/api/v2/releases/Devedition-104.0b1-build1 via POST
2022-07-05 18:26:36,308 - balrogclient.api - DEBUG - Data sent: {'blob': {'platforms': {'Linux_x86-gcc3': {'locales': {'en-CA': {'buildID': '20220705111051', 'appVersion': '104.0', 'displayVersion': '104.0 Beta 1', 'completes': [{'from': '*', 'filesize': 66371739, 'hashValue': '2dfa6daf912595ba6c82701ac25428163d3d9c9c28bc4853e9dc38ec3410000fe56f2dcac88e7fb7635715c083edd3ce097577ab12a487e4b957663c6851f119'}], 'partials': [{'from': 'Devedition-74.0b4-build4', 'filesize': 57319725, 'hashValue': 'fdecf6ac2d335a226e329ca013c1aedf7fdc1ca3855f87fb4f61945ddfa691cf6304f95a4cce0adaec6b7d9dbc91d0d81de2e96553667644382bc3b6c8439770'}]}}}}}, 'old_data_versions': {'platforms': {'Linux_x86-gcc3': {'locales': {}}}}}
2022-07-05 18:26:36,310 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): admin-stage.balrog.nonprod.cloudops.mozgcp.net:443
2022-07-05 18:26:36,362 - urllib3.connectionpool - DEBUG - https://admin-stage.balrog.nonprod.cloudops.mozgcp.net:443 "POST /api/v2/releases/Devedition-104.0b1-build1 HTTP/1.1" 500 107
2022-07-05 18:26:36,364 - balrogclient.api - ERROR - Caught HTTPError: b'{"type": "about:blank", "title": "Internal Server Error", "detail": "Internal Server Error", "status": 500}'

vs.

2022-07-05 18:20:07,830 - balrogclient.api - DEBUG - Balrog request to https://admin-stage.balrog.nonprod.cloudops.mozgcp.net/api/v2/releases/Devedition-104.0b1-build1 via POST
2022-07-05 18:20:07,830 - balrogclient.api - DEBUG - Data sent: {'blob': {'platforms': {'Linux_x86-gcc3': {'locales': {'en-CA': {'buildID': '20220705111051', 'appVersion': '104.0', 'displayVersion': '104.0 Beta 1', 'completes': [{'from': '*', 'filesize': 66372719, 'hashValue': 'b20305ef753793444771085df3973178c0648c47a48a8adcd48b6a9d9bff7459791dbd0bba60eba078c06f7349a131aae4f1d1eb4be2db5a8d29be0aef25d4f1'}], 'partials': [{'from': 'Devedition-74.0b4-build4', 'filesize': 57318301, 'hashValue': '3006aaa59668522f8a70e617e6a81d86ed0e8cc39f57106c72473a8e665bfbe51d81da91a6e1cf3193a249305d107c01979412c778ce308204ac7439bfcec287'}]}}}}}, 'old_data_versions': {'platforms': {'Linux_x86-gcc3': {'locales': {}}}}}
2022-07-05 18:20:07,832 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): admin-stage.balrog.nonprod.cloudops.mozgcp.net:443
2022-07-05 18:20:08,048 - urllib3.connectionpool - DEBUG - https://admin-stage.balrog.nonprod.cloudops.mozgcp.net:443 "POST /api/v2/releases/Devedition-104.0b1-build1 HTTP/1.1" 200 101
2022-07-05 18:20:08,050 - balrogclient.api - INFO - Data recieved: {"platforms": {"Linux_x86-gcc3": {"locales": {"en-CA": 1}}}}
2022-07-05 18:20:08,050 - balrogclient.api - DEBUG - REQUEST STATS: {"timestamp": 1657045208.0502152, "method": "POST", "url": "https://admin-stage.balrog.nonprod.cloudops.mozgcp.net/api/v2/releases/Devedition-104.0b1-build1", "status_code": 200, "elapsed_secs": 0.217107}
2022-07-05 18:20:08,053 - balrogscript.script - INFO - Taskcluster Release style Balrog submission