mozilla-releng/balrog

requests to aus-api.mozilla.org/api/v1/releases are slow

Opened this issue · 8 comments

Looks like this is for two reasons:

  1. We don't delete old releases in the v2 tables
  2. We don't optimize the queries when names_only is passed (we still query scheduled changes tables, and pull unneeded data).

For now, let's improve the queries. We can look at deleting old releases later, if it becomes necessary.

Part 2 is addressed by #1738

wsmwk commented

This is consistently getting worse - 20 - 30 seconds, sometimes more.

Currently I'm seeing about 20 seconds for https://aus-api.mozilla.org/api/v1/releases (returns about 1.7 Mb of data).

A more targeted query, like https://aus-api.mozilla.org/api/v1/releases?product=Thunderbird (returns 400 Kb) is not appreciably faster.

For comparison, retrieving a specific release like https://aus-api.mozilla.org/api/v1/releases/OpenH264-1.8.1.1-with-mac-aarch64 (4 Kb) takes less than a second.

https://aus-api.mozilla.org/api/v1/releases?names_only=1 quickly returns a list of all release names, and https://aus-api.mozilla.org/api/v1/releases/{name} quickly returns info about the named release.

There are about 10000 releases now, so it's not practical to retrieve the names and then retrieve details for each -- unless the list of releases can be significantly filtered on the release name only.

btw, top 6 # of releases / product:

'Firefox': 5325
'Thunderbird': 2261
'Devedition': 1132
'SystemAddons': 490
'Fennec': 363
'Pinebuild': 224

get_releases() - https://github.com/mozilla-releng/balrog/blob/main/src/auslib/services/releases.py#L258 - goes to a lot of trouble to cross-reference the basic list of releases with rule, scheduled changes, and signoffs; I suspect (but haven't specifically verified) that this is where time is spent. I haven't spotted any missing optimizations.

I wonder if it would be useful to add an additional mode to get_releases(), selected by a new parameter, that returns more than just the release names, but less of the time-consuming cross-referenced data.

@wsmwk If you'd like to describe a particular use case, and that doesn't require all the data currently returned, that might suggest a way forward.

get_releases() - https://github.com/mozilla-releng/balrog/blob/main/src/auslib/services/releases.py#L258 - goes to a lot of trouble to cross-reference the basic list of releases with rule, scheduled changes, and signoffs; I suspect (but haven't specifically verified) that this is where time is spent. I haven't spotted any missing optimizations.

I wonder if it would be useful to add an additional mode to get_releases(), selected by a new parameter, that returns more than just the release names, but less of the time-consuming cross-referenced data.

I don't think I ever profiled this to be certain -- but it would not be surprising at all if all the scheduled change and history queries were a big part of the slowdown. It looks like it's 5 additional queries per release and IIRC there's a fair amount of per query overhead.

(This could probably be verified fairly easily locally by comparing performance with and without the block with all of these queries.)

When doing Thunderbird release final signoffs, we usually adjust the update rate by modifying the rule. Loading the form takes some time to populate the list of releases that the rule in question could point to, and it's definitely getting worse. 99% of the time we are only adjusting the update rate.

For Thunderbird beta, automation does set the rate based on the beta number, so we are changing it less frequently lately, but after a day or two we do tend to bump it little higher, so automation does not completely meet our needs. Release has the same problem, except without automation.

Typically, Thunderbird has one beta a week,with a rate bump mid-week. Before a stable release, we will do 2 a betas a week. Stable releases probably average out to 2+ish per month. January-June its lower, July-December higher.