UKHSA-Internal/coronavirus-dashboard-api-python-sdk

Add support for API version 2

lmmx opened this issue · 0 comments

lmmx commented

I'm looking into how we can move this SDK along to version 2, and wanted to suggest one approach, and discuss suitability or other options.

Firstly, I was expecting to retain the Cov19API class and just expand it to support both v1 and v2 APIs. The v1 API timestamp endpoint is used within this class, and this does not appear to be available on the v2 API so that would need to be retained even when obtaining data from v2.

  • An alternative option would be to copy the Cov19API and make a Cov19APIv2 class in an entirely new module which could avoid making a 'Frankenstein' class trying to juggle needs of two distinct API versions. Any duplication could then simply be refactored into a base class from which both inherit.

Specifically, to retain backwards compatibility, I was expecting to initialise the Cov19API with a parameter version defaulting to "1", and storing a class attribute _supported_versions: list[str] = ["1", "2"] against which this is checked on initialisation.

Looking at the features of the v1 API and those in the new one, filters has been removed, and only partly reimplemented in the v2 API. One solution here would be to make the filters parameter optional, and then if provided enforce version == "2" else enforce v1.

Personally I suspect that separating the APIs out into separate classes may indeed be the cleaner approach for library design, but as a user you probably just want a single 'entry point' and not to pick between multiple classes. This however leaves the messy situation of 'retaining backward compatibility' meaning that said single class would default to v1, hence my idea to automatically use v2 if filters were not specified (the only way I can think to resolve both these constraints).

  • typing.overload may be a nice way to cleanly specify distinct signatures for v1 and v2 usage of the class __init__ method

Other minor improvements I've already implemented or would like to:

  • Use from __future__ import annotations (allows parameterised types like dict rather than Dict; allows | instead of Union)
  • Use isort to ensure proper import structure; remove code comments indicating import convention
  • Lint with black for consistency among multiple developers (potentially could also use pre-commit hooks; would let you decide if desirable)

Further to this, my initial motivation for contributing was to amass an exhaustive list (which in the context of an SDK could simply be one or more Enum classes) of the metrics etc. which are available, and also to indicate which are incompatible (as is mentioned in parts of the API documentation). As mentioned in this blog post:

"not all metrics on the page are available for all areas every day. For instance, vaccination data in England are published at MSOA level, while in Scotland, they are published at local authority level."

An additional ‘nice-to-have’ (both for users to avoid hitting rate limits but also UKHSA to avoid excessive query load) would be to have a default resource management feature such that when the same resource is re-requested then the locally proxied file is used instead. This could use for example a sqlite database stored as a temporary file in /var/tmp or /var/cache (i.e. directories that persist on reboot), or just use the tempfile.mkdtemp defaults for non-Unix OS’s (Windows)