/Herpesvirus-Glycoprotein-Analysis

Analysis of natural diversity in herpesvirus glycoproteins

Primary LanguagePythonMIT LicenseMIT

Phylogeny analysis for herpesvirus glycoproteins

This repo contains multiple directories that analyze the diversity of specific herpesvirus glycoproteins by running a snakemake pipeline that downloads sequences based a list of accessions, processes the sequences, and constructs a phylogenetic Nextstrain tree that can be viewed using Auspice. Each herpesvirus directory contains its own snakemake pipeline.

The numbering scheme of each protein is relative to the NCBI Virus reference strains (HSV-1: NC_001806.2, EBV: NC_007605.1, HSV-2: NC_001798.2).

Analysis performed by Caleb Carr.

Nextstrain visualizations of the trees

The trees can be colored by several features (e.g., genotype, date, country) by selecting the corresponding option in the Color By dropdown. For example, the HSV-1 gB tree can be colored by amino acid identity at a position by selecting Genotype in the dropdown menu and then selecting HSV1_gB. Entering a position will then color the tree by the amino acid identity at that position. Note that the protein numbering is relative to the NCBI Virus reference strains (HSV-1: NC_001806.2, EBV: NC_007605.1, HSV-2: NC_001798.2). Other features can be viewed by mousing over or clicking on the nodes and branches of the tree.

HSV-1:

EBV:

HSV-2:

Protein alignments of glycoproteins

Alignments are constructed relative to the NCBI Virus reference strains (HSV-1: NC_001806.2, EBV: NC_007605.1, HSV-2: NC_001798.2).

HSV-1:

EBV:

HSV-2:

Organization of this repo

  • HSV1: Contains the analysis workflow for HSV-1
  • EBV: Contains the analysis workflow for EBV
  • HSV2: Contains the analysis workflow for HSV-2
  • auspice: Contains the final nextstrain tree files for each herpesvirus that then can be viewed using Nextstrain community share via GitHub. Note that these final files are manually copied from each individual herpesvirus directory.