We compute and present the coverage of sequence sampling in local authorities (LAs) in the UK. We define the coverage as the number of sequences per case. We calculate one value per day per LA. In order to smooth the effects of small numbers, the number of sequences and cases given to a day is the sum of sequences or cases over the preceding two weeks.
Sequences are counted from the COG-UK server, and cases are accessed from the ONS API (https://api.coronavirus.data.gov.uk).
The coverages lend themselves to the definition of sample weights, which
can be used to reweight sequences in analyses. The inverse of the
coverage is the number of cases
() per sequence
(), which we denote
for LA and
day . Then, to rebalance
all LAs at a single day , we define the weights
where
.
Likewise, to consider a timeseries over a single LA (or a larger,
aggregated geography), we would define
where
. The consequence of these reweightings is that
the most representative sequence retains a weight of 1, and all other
sequences are downweighted relatively.
From these, we can determine the effective sample size (ESS) of e.g. the set of sequences sampled on a given day :
Using these weight definitions, the ESS is the total number of cases scaled down by the smallest coverage (sequences per case) among all LAs.
In file a0-weights.csv, we store weights normalised by the LA mean. That
is,
and
.