/UK-lineage-dynamics-analysis

Large-scale virus genome sequencing reveals the genetic structure and importation dynamics of a national COVID-19 epidemic.

Primary LanguageRCreative Commons Attribution 4.0 InternationalCC-BY-4.0

Establishment & lineage dynamics of the SARS-CoV-2 epidemic in the UK

Louis du Plessis, John T. McCrone, Alexander E. Zarebski, Verity Hill, Christopher Ruis, Bernardo Gutierrez, Jayna Raghwani, Jordan Ashworth, Rachel Colquhoun, Thomas R. Connor, Nuno R. Faria, Ben Jackson, Nicholas J. Loman, Áine O’Toole, Samuel M. Nicholls, Kris V. Parag, Emily Scher, Tetyana I. Vasylyeva, Erik M. Volz, Alexander Watts, Isaac I. Bogoch, Kamran Khan, the COVID-19 Genomics UK (COG-UK) Consortium, David M. Aanensen, Moritz U. G. Kraemer, Andrew Rambaut, Oliver G. Pybus

DOI


This repository contains the data and code used to generate the results presented in https://doi.org/10.1101/2020.10.23.20218446. Some of the scripts may need some adjustment depending on the local setup.

Note that because of the GISAID terms of use genomic sequences cannot be shared in this repository. Instead, we make the GISAID accessions available and provide a table of acknowledgements. Note also that we cannot make administrative level two (adm2) metadata for genomic sequences available. All genomic sequences produced by COG-UK are available here.

Abstract

The UK’s COVID-19 epidemic during early 2020 was one of world’s largest and unusually well represented by virus genomic sampling. Here we reveal the fine-scale genetic lineage structure of this epidemic through analysis of 50,887 SARS-CoV-2 genomes, including 26,181 from the UK sampled throughout the country’s first wave of infection. Using large-scale phylogenetic analyses, combined with epidemiological and travel data, we quantify the size, spatio-temporal origins and persistence of genetically-distinct UK transmission lineages. Rapid fluctuations in virus importation rates resulted in >1000 lineages; those introduced prior to national lockdown tended to be larger and more dispersed. Lineage importation and regional lineage diversity declined after lockdown, whilst lineage elimination was size-dependent. We discuss the implications of our genetic perspective on transmission dynamics for COVID-19 epidemiology and control.

Repository usage and structure

The structure of this repository is shown below:

uk-intros-analyses/
├── analyses
│   ├── epidemiological
│   ├── phylogenetic
│   ├── spatial
│   └── README.md
├── data
│   ├── epidemiological
│   ├── phylogenetic
│   ├── spatial
│   └── README.md
├── LICENSE.md
├── LICENSE.gpl.md
└── README.md

Input data

All input data that we are able to share publicly are stored in the data directory.

Analyses

The analyses directory contains sub-directories containing the details of each type of analysis:

  • Epidemiological analyses: The epidemiological directory contains a README.org file describing how to run the included scripts to carry out the epidemiological analysis. The output goes into the results directory.
  • Phylogenetic analyses: The phylogenetic directory contains a README.md file describing how to run the analyses. Minimal output is included in the results directory.
  • Spatial analyses: The spatial directory contains a README.md file describing how to process adm2 regions and output files stored in the results directory.

License

Except where otherwise noted the content of this project is licensed under the Creative Commons Attribution 4.0 International License, and all source code (unless otherwise noted) is licensed under the GNU General Public License v3.0.