jeromekelleher/sc2ts-paper

Notes on preprocessing data for run on Viridian 0.4

Opened this issue · 1 comments

szhan commented

Write down some details about deduplicating sample sequences and MAFFT alignment (version and options).

szhan commented

Consensus sequences, which were assembled using Viridian by Hunt et al. (2024), were downloaded from Figshare (Viridian v0.4). We only used the sequences that ended up in the Viridian phylogeny built using UShER by Hunt et al.

The sequences were aligned using MAFFT v7.525 (2024/Mar/13) with the flags '--keeplength --add', as in Hunt et al., except that gaps were kept rather than being subsequently filled with reference bases. See #212.

There are some samples that have multiple replicate sequences, for example, produced using two different sequencing protocols. For these samples, one replicate sequence was chosen using a set of criteria in the metadata file. See #209.

URLs
Viridian consensus sequences input to UShER
Viridian_tree_cons_seqs.tar

Metadata file
run_metadata.v04.tsv.gz

Citations
Hunt et al. (2024) https://doi.org/10.1101%2F2024.04.29.591666
Katoh & Standley (2013) https://doi.org/10.1093%2Fmolbev%2Fmst010
Katoh et al. (2002) https://academic.oup.com/nar/article/30/14/3059/2904316