This folder contains Snakemake [Köster et al., 2012] pipelines for reconstruction of evolutionary history of Zika.
The pipeline steps are detailed below.
The input data are located in the data folder and contain (1) Vietnamese sequences in the file Vietnam.fa and (2) genbank_20200811_org_Zika_virus_len_8000_14000.fa sequences, which were downloaded from GenBank [Benson et al. 2013] on 2020/08/11 with the keywords: organism “Zika virus”, and sequence length between 8000-14000 (full genome).
The input GenBank sequences were annotated with the collection_date and country using Entrez [NCBI Resource Coordinators 2012].
The sequences were typed (African vs Asian) with Genome Detective [Vilsker et al. 2019], and those with the type support < 100 removed.
The sequences were aligned against the reference [Theys et al. 2017] (which was then removed from the alignment) with MAFFT [Katoh and Standley 2013].
The metadata extraction, sequence combining and alignment pipeline Snakefile_combined_MSA is avalable in the snakemake folder and can be rerun as (from the snakemake folder):
snakemake --snakefile Snakefile_combined_MSA --keep-going --use-singularity -singularity-args "--home ~"
We reconstructed a maximum likelihood tree from the DNA sequences using partitioning into two groups: positions 1-2, and 3. The tree reconstruction was performed with 2 ML tools allowing for partitioning (GTRGAMMA+G6+I): RAxML-NG [Stamatakis, 2014] and IQ-TREE 2 [Minh et al., 2020], resulting in 2 trees with different topologies, the better tree (in terms of likelihood) was then selected.
The non-informative branches (<= 1/2 mutation) were then collapsed and the tree was rooted with the African outgroup (removed).
The phylogeny reconstruction pipeline Snakefile_phylogeny is avalable in the snakemake folder and can be rerun as (from the snakemake folder):
snakemake --snakefile Snakefile_phylogeny --keep-going --use-singularity -singularity-args "--home ~"
The phylogeny was dated with LSD 2 [To et al., 2015] (with temporal outlier removal). For comparison, the phylogeny was also dated with TreeTime [Sagulenko et al., 2018]. We then reconstructed ancestral characters for country using PastML [Ishikawa et al., 2018], on the full dated tree and subsampled trees (to assess the robustness of the phylogeographic predictions).
To perform tree dating, from the snakemake folder, run the Snakefile_dating pipeline:
snakemake --snakefile Snakefile_dating --keep-going --use-singularity --singularity-args "--home ~"
To perform phylogeographic analysis, from the snakemake folder, run the Snakefile_phylogeography pipeline:
snakemake --snakefile Snakefile_phylogeography --keep-going --use-singularity --singularity-args "--home ~"