Sequencing coverage and weights

Coverage

We compute and present the coverage of sequence sampling in local authorities (LAs) in the UK. We define the coverage as the number of sequences per case. We calculate one value per day per LA. In order to smooth the effects of small numbers, the number of sequences and cases given to a day is the sum of sequences or cases over the preceding two weeks.

Sequences are counted from the COG-UK server, and cases are accessed from the ONS API (https://api.coronavirus.data.gov.uk).

Weights

The coverages lend themselves to the definition of sample weights, which can be used to reweight sequences in analyses. The inverse of the coverage is the number of cases (c) per sequence (s), which we denote v_{i,j} for LA i and day j. Then, to rebalance all LAs at a single day j, we define the weights
w_{i,j} = v_{i,j}/v_j^*
where v_j^*=\max_i(v_{i,j}).

Likewise, to consider a timeseries over a single LA (or a larger, aggregated geography), we would define
w_{i,j} = v_{i,j}/v_i^*
where v_i^*=\max_j(v_{i,j}). The consequence of these reweightings is that the most representative sequence retains a weight of 1, and all other sequences are downweighted relatively.

From these, we can determine the effective sample size (ESS) of e.g. the set of sequences sampled on a given day j:

ESS_j = \sum_iw_{i,j}s_{i,j}.

Using these weight definitions, the ESS is the total number of cases scaled down by the smallest coverage (sequences per case) among all LAs.

a0-weights.csv

In file a0-weights.csv, we store weights normalised by the LA mean. That is,
w_{i,j} = v_{i,j}/v^*
and v^*=\text{mean}_j(v_{i,j}).