mozilla-releng/balrog

TypeError /v2.auslib_web_admin_views_releases_v2_update_release

Opened this issue · 7 comments

This showed up in Sentry on March 27th. There were 179 instances of it that day, and none since. The traceback is:

TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'
  File "flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "connexion/decorators/decorator.py", line 68, in wrapper
    response = function(request)
  File "connexion/decorators/uri_parsing.py", line 149, in wrapper
    response = function(request)
  File "connexion/decorators/validation.py", line 196, in wrapper
    response = function(request)
  File "connexion/decorators/validation.py", line 399, in wrapper
    return function(request)
  File "connexion/decorators/response.py", line 112, in wrapper
    response = function(request)
  File "connexion/decorators/parameter.py", line 120, in wrapper
    return function(**kwargs)
  File "auslib/web/admin/views/releases_v2.py", line 39, in update_release
    new_data_versions = releases.update_release(name, body["blob"], body["old_data_versions"], body.get("when"), request.username, request.transaction)
  File "auslib/services/releases.py", line 521, in update_release
    set_by_path(new_data_versions, path, old_data_version + 1)

Obviously not a huge issue, but there's clearly at least one case here where old_data_version may not be initialized (possibly a race condition?).

We've had another spike of this in the past week, again only on stage.

I ran two staging releases today. The first was 122.0b2, which worked fine. The second had all of its per locale Balrog submission tasks fail with this. Perhaps a coincidence, but I wonder if there's some causation here?

OK, I know what's happening here! For the most recent spate of them today, with 122.0b3, we actually had two 122.0b3's triggered on Try. One was done by me, through Ship It, the other by @jcristau, directly through Treeherder. This is a bit weird, but ultimately shouldn't necessarily fail (although it would never happen in production...).

The reason this ends up becoming an issue is because balrogscript sets old_data_versions as empty for every locale submission. I have a vague memory that this was done beacuse the API requires it always to be set -- but obviously this is very wrong if the locale already exists in the release. This data ends up in old_data_versions in releases.py, which is how we end up with old_data_version set to None.

There's a few actions we ought to take here, although I don't think anything in urgent now that we know what's going on:

  • Ensure balrogscript passes old_data_versions when appropriate.
  • Ensure Balrog's admin API allows old_data_versions to be empty (this probably means we need some post-swagger verification that they're only empty when appropriate).
  • Add a safeguard in releases.py that fails with a more appropriate and useful error if old_data_version is None when it needs to be an int.

I filed mozilla-releng/scriptworker-scripts#885 for the balrogscript issue.