This README describes the scripts used for the sequence analysis in:
A structural explanation for the low effectiveness of the seasonal influenza H3N2 vaccine
This analysis is adapted from McWhite et al. 2016
- Fasta/pdmH1N1_All.fa: 2009 pandemic H1N1 (swine flu) HA sequences downloaded from GISAID
- Fasta/HumanH3N2_All.fa: Human H3N2 HA sequences downloaded from GISAID
- Since there is a limit on the number of sequences being downloaded at once on GISAID, sequences for this project was first downloaded separately based on continent. Then sequences from different continent were combined to a single Fasta file.
- Fasta/Bris07_fromNCBI.fa: 11 Bris07 sequences from the NCBI protein database were obtained by searching "A/Brisbane/10/2007", hemagglutinin.
- Fasta/HK14_fromGISAID.fa: 8 HK14 sequences from GISAID were obtained by "A/Hong Kong/4801/2014".
- Fasta/Sing16_fromGISAID.fa: 4 Sing16 sequences from GISAID were obtained by searching "A/Singapore/INFIMH-16-0019/2016".
- Multiple sequence alignment (MSA) using MAFFT version 7.157b
- mafft --auto Fasta/pdmH1N1_All.fa > Fasta/pdmH1N1_All.aln
- mafft --auto Fasta/HumanH3N2_All.fa > Fasta/HumanH3N2_All.aln
- Parse MSA files to extract information on egg-passaged isolates
- python script/ParseGISAIDaln.py:
- Input files:
- Fasta/pdmH1N1_All.aln
- Fasta/HumanH3N2_All.aln
- Output files:
- result/HumanH3N2_Pos194YearVsPSG.tsv
- result/HumanH3N2_EggOri.fa
- result/HumanH3N2_PSG.tsv
- result/pdmH1N1_Pos194YearVsPSG.tsv
- result/pdmH1N1_EggOri.fa
- result/pdmH1N1_PSG.tsv
- Input files:
- Plot the frequency of different amino acids observed at residue 194 in different year
- Rscript script/Plot_YearVsPSG.R
- Input files:
- result/H3N2_Pos194YearVsPSG.tsv
- result/pdmH1N1_Pos194YearVsPSG.tsv
- Output files:
- graph/H3N2_YearVsAA_resi194.png
- graph/pdmH1N1_YearVsAA_resi194.png
- Input files:
- Plot the frequency of L194P against the number of passage in eggs
- Rscript script/Plot_ProVsPSG.R
- Input file:
- result/HumanH3N2_PSG.tsv
- Output file:
- graph/HumanH3N2_ProVsPSG.png
- Input file:
- Multiple sequence alignment (MSA) using MAFFT version 7.157b
- mafft --auto Fasta/Bris07_fromNCBI.fa > Fasta/Bris07_fromNCBI.aln
- Parse MSA files to extract amino-acid identity on residue 194
- python script/ParseNCBIseq.py
- Input file:
- Fasta/Bris07_fromNCBI.aln
- Standard Output
- Input file: