This README describes the scripts used for the analyses in:
Diversity of functionally permissive sequences in the receptor-binding site of influenza hemagglutinin
This study aims to examine the functional sequence diversity and epistasis of influenza A virus hemagglutinin (HA) receptor-binding site (RBS). Deep mutational scanning experiment was performed for the HA RBS of two strains, namely A/WSN/33 (WSN; H1N1) and A/Hong Kong/1/1968 (HK68; H3N2). The experiment probed for the fitness effect of mutants that contain up to three amino-acid substitutions.
- All sequencing raw reads, which can be downloaded from NIH SRA database PRJNA353496, should be placed in fastq/ folder:
- WSN single mutant input library: fastq/WSN_HARBS-1_S12_L001_R1_001.fastq and fastq/WSN_HARBS-1_S12_L001_R2_001.fastq
- WSN single mutant passaged library: fastq/WSN_HARBS-4_S15_L001_R1_001.fastq and fastq/WSN_HARBS-4_S15_L001_R2_001.fastq
- WSN double mutant input library: fastq/WSN_HARBS-5_S16_L001_R1_001.fastq and fastq/WSN_HARBS-5_S16_L001_R2_001.fastq
- WSN double mutant passaged library: fastq/WSN_HARBS-2_S13_L001_R1_001.fastq and fastq/WSN_HARBS-2_S13_L001_R2_001.fastq
- WSN triple mutant input library: fastq/WSN_HARBS-3_S14_L001_R1_001.fastq and fastq/WSN_HARBS-3_S14_L001_R2_001.fastq
- WSN triple mutant passaged library: fastq/WSN_HARBS-6_S17_L001_R1_001.fastq and fastq/WSN_HARBS-6_S17_L001_R2_001.fastq
- HK68 triple mutant input library: fastq/HK68-Tlib-1_S1_L001_R1_001.fastq and fastq/HK68-Tlib-1_S1_L001_R2_001.fastq
- HK68 triple mutant passaged library (round 1): fastq/HK68-Tlib-2_S2_L001_R1_001.fastq and fastq/HK68-Tlib-2_S2_L001_R2_001.fastq
- HK68 triple mutant passaged library (round 2): fastq/HK68-Tlib-3_S3_L001_R1_001.fastq and fastq/HK68-Tlib-3_S3_L001_R2_001.fastq
- script/WSN_HARBS_read2RFindex.py: Converting raw reads to counts and RF index
- Input file: fastq/WSN*.fastq
- Output file: result/WSN*.count
- script/WSN_HARBS_compileMutFit.py: Compile *.count files into a single file
- Input file: result/WSN*.count
- Output file: result/WSN_MutFitTable.tsv
- script/WSN_HARBS_CalMaxFit.py: Calculate the maximum RF index among all mutants that carried a specified substitution
- Input file: result/WSN_MutFitTable.tsv
- Output file: result/WSN_MaxFitMut_*.tsv
- script/WSN_HARBS_CrypticBen.py: Search for the maximum beneficial effect for a given substitution of interest (max. RF increase) among all genetic backgrounds
- Input file: result/WSN_MutFitTable.tsv
- Output file: result/WSN_crypticben.tsv
- script/HK68_HARBS_read2RFindex.py: Converting raw reads to counts and RF index
- Input file: fastq/HK68*.fastq
- Output file: result/HK68_Tlib.count
- script/HK68_HARBS_compileMutFit.py: Compile *.count files into a single file and also generate a combined file with WSN
- Input file:
- result/HK68_Tlib.count
- result/WSN_MutFitTable.tsv
- Output file:
- result/HK68_MutFitTable.tsv
- result/MutCompareTable.tsv
- script/HK68_HARBS_CalMaxFit.py: Calculate the maximum RF index among all mutants that carried a specified substitution
- Input file: result/HK68_MutFitTable.tsv
- Output file: result/HK68_MaxFitMut_*.tsv
- script/HK68_HARBS_CrypticBen.py: Search for the maximum beneficial effect for a given substitution of interest (max. RF increase) among all genetic backgrounds
- Input file: result/HK68_MutFitTable.tsv
- Output file: result/HK68_crypticben.tsv
- script/EpisCount.py: Count reciprocal sign epistasis
- Input file:
- result/WSN_MutFitTable.tsv
- result/HK68_MutFitTable.tsv
- Output file: result/EpisCountAroundWT.tsv
- script/WSN_HARBS_FitQC.R: Plot the RF index distribution of silent, missense, and nonsense mutations
- Input file: result/WSN*.count
- Output file: graph/WSN_FitQC_stripchart.png
- script/WSN_HARBS_PlotMaxFit.R: Plot the maximum RF index among all mutants that carried a specified substitution as heatmap and also plot their distribution
- Input file: result/WSN_MaxFitMut_*.tsv
- Output file: graph/WSN_MaxFitMut_*.png
- script/WSN_HARBS_CrypticBen.R: Plot the maximum beneficial effect for a given substitution of interest (max. RF increase) among all genetic backgrounds
- Input file: result/WSN_crypticben.tsv
- Output file: WSN_CrypticBen*.png
- script/HK68_HARBS_FitQC.R: Plot the RF index distribution of silent, missense, and nonsense mutations
- Input file:
- result/HK68_Tlib.count
- result/HK68_MutFitTable.tsv
- Output file: graph/HK68_FitQC_stripchart.png
- Input file:
- script/HK68_HARBS_PlotMaxFit.R: lot the maximum RF index among all mutants that carried a specified substitution as heatmap and also plot their distribution
- Input file: result/HK68_MaxFitMut_*.tsv
- Output file: graph/HK68_MaxFitMut_*.png
- script/HK68_HARBS_CrypticBen.R: Plot the maximum beneficial effect for a given substitution of interest (max. RF increase) among all genetic backgrounds
- Input file: result/HK68_crypticben.tsv
- Output file: HK68_CrypticBen*.png
-
script/EpisPlot.R: Compare reciprocal sign epistasis by barplot and perform Fisher's exact test
- Input file: result/EpisCountAroundWT.tsv
- Output file:
- graph/WSN_EpisCount.png
- graph/HK68_EpisCount.png
-
script/HARBS_compare.R: Compare RF index of a given variant in WSN and HK68 backgrounds
- Input file: result/MutCompareTable.tsv
- Output file: graph/MutFitCompare.png
-
script/EpiMap.R: Plot the pairwise position of reciprocal sign epistasis
- Input file:
- result/WSN_EpisMap.tsv
- result/HK68_EpisMap.tsv
- Output file:
- graph/WSN_EpiMap.png
- graph/HK68_EpiMap.png
- Input file: