/HARBS

Primary LanguagePython

This README describes the scripts used for the analyses in:
Diversity of functionally permissive sequences in the receptor-binding site of influenza hemagglutinin

ANALYSIS FOR HA RBS DEEP MUTATIONAL SCANNING EXPERIMENTS

This study aims to examine the functional sequence diversity and epistasis of influenza A virus hemagglutinin (HA) receptor-binding site (RBS). Deep mutational scanning experiment was performed for the HA RBS of two strains, namely A/WSN/33 (WSN; H1N1) and A/Hong Kong/1/1968 (HK68; H3N2). The experiment probed for the fitness effect of mutants that contain up to three amino-acid substitutions.

INPUT FILE

  • All sequencing raw reads, which can be downloaded from NIH SRA database PRJNA353496, should be placed in fastq/ folder:
    • WSN single mutant input library: fastq/WSN_HARBS-1_S12_L001_R1_001.fastq and fastq/WSN_HARBS-1_S12_L001_R2_001.fastq
    • WSN single mutant passaged library: fastq/WSN_HARBS-4_S15_L001_R1_001.fastq and fastq/WSN_HARBS-4_S15_L001_R2_001.fastq
    • WSN double mutant input library: fastq/WSN_HARBS-5_S16_L001_R1_001.fastq and fastq/WSN_HARBS-5_S16_L001_R2_001.fastq
    • WSN double mutant passaged library: fastq/WSN_HARBS-2_S13_L001_R1_001.fastq and fastq/WSN_HARBS-2_S13_L001_R2_001.fastq
    • WSN triple mutant input library: fastq/WSN_HARBS-3_S14_L001_R1_001.fastq and fastq/WSN_HARBS-3_S14_L001_R2_001.fastq
    • WSN triple mutant passaged library: fastq/WSN_HARBS-6_S17_L001_R1_001.fastq and fastq/WSN_HARBS-6_S17_L001_R2_001.fastq
    • HK68 triple mutant input library: fastq/HK68-Tlib-1_S1_L001_R1_001.fastq and fastq/HK68-Tlib-1_S1_L001_R2_001.fastq
    • HK68 triple mutant passaged library (round 1): fastq/HK68-Tlib-2_S2_L001_R1_001.fastq and fastq/HK68-Tlib-2_S2_L001_R2_001.fastq
    • HK68 triple mutant passaged library (round 2): fastq/HK68-Tlib-3_S3_L001_R1_001.fastq and fastq/HK68-Tlib-3_S3_L001_R2_001.fastq

ANALYSIS PEPLINE

FOR WSN

  1. script/WSN_HARBS_read2RFindex.py: Converting raw reads to counts and RF index
  • Input file: fastq/WSN*.fastq
  • Output file: result/WSN*.count
  1. script/WSN_HARBS_compileMutFit.py: Compile *.count files into a single file
  • Input file: result/WSN*.count
  • Output file: result/WSN_MutFitTable.tsv
  1. script/WSN_HARBS_CalMaxFit.py: Calculate the maximum RF index among all mutants that carried a specified substitution
  • Input file: result/WSN_MutFitTable.tsv
  • Output file: result/WSN_MaxFitMut_*.tsv
  1. script/WSN_HARBS_CrypticBen.py: Search for the maximum beneficial effect for a given substitution of interest (max. RF increase) among all genetic backgrounds
  • Input file: result/WSN_MutFitTable.tsv
  • Output file: result/WSN_crypticben.tsv

FOR HK68

  1. script/HK68_HARBS_read2RFindex.py: Converting raw reads to counts and RF index
  • Input file: fastq/HK68*.fastq
  • Output file: result/HK68_Tlib.count
  1. script/HK68_HARBS_compileMutFit.py: Compile *.count files into a single file and also generate a combined file with WSN
  • Input file:
    • result/HK68_Tlib.count
    • result/WSN_MutFitTable.tsv
  • Output file:
    • result/HK68_MutFitTable.tsv
    • result/MutCompareTable.tsv
  1. script/HK68_HARBS_CalMaxFit.py: Calculate the maximum RF index among all mutants that carried a specified substitution
  • Input file: result/HK68_MutFitTable.tsv
  • Output file: result/HK68_MaxFitMut_*.tsv
  1. script/HK68_HARBS_CrypticBen.py: Search for the maximum beneficial effect for a given substitution of interest (max. RF increase) among all genetic backgrounds
  • Input file: result/HK68_MutFitTable.tsv
  • Output file: result/HK68_crypticben.tsv

COMPARISON ANALYSIS

  1. script/EpisCount.py: Count reciprocal sign epistasis
  • Input file:
    • result/WSN_MutFitTable.tsv
    • result/HK68_MutFitTable.tsv
  • Output file: result/EpisCountAroundWT.tsv

PLOTTING

FOR WSN

  • script/WSN_HARBS_FitQC.R: Plot the RF index distribution of silent, missense, and nonsense mutations
    • Input file: result/WSN*.count
    • Output file: graph/WSN_FitQC_stripchart.png
  • script/WSN_HARBS_PlotMaxFit.R: Plot the maximum RF index among all mutants that carried a specified substitution as heatmap and also plot their distribution
    • Input file: result/WSN_MaxFitMut_*.tsv
    • Output file: graph/WSN_MaxFitMut_*.png
  • script/WSN_HARBS_CrypticBen.R: Plot the maximum beneficial effect for a given substitution of interest (max. RF increase) among all genetic backgrounds
    • Input file: result/WSN_crypticben.tsv
    • Output file: WSN_CrypticBen*.png

FOR HK68

  • script/HK68_HARBS_FitQC.R: Plot the RF index distribution of silent, missense, and nonsense mutations
    • Input file:
      • result/HK68_Tlib.count
      • result/HK68_MutFitTable.tsv
    • Output file: graph/HK68_FitQC_stripchart.png
  • script/HK68_HARBS_PlotMaxFit.R: lot the maximum RF index among all mutants that carried a specified substitution as heatmap and also plot their distribution
    • Input file: result/HK68_MaxFitMut_*.tsv
    • Output file: graph/HK68_MaxFitMut_*.png
  • script/HK68_HARBS_CrypticBen.R: Plot the maximum beneficial effect for a given substitution of interest (max. RF increase) among all genetic backgrounds
    • Input file: result/HK68_crypticben.tsv
    • Output file: HK68_CrypticBen*.png

COMPARISON ANALYSIS

  • script/EpisPlot.R: Compare reciprocal sign epistasis by barplot and perform Fisher's exact test

    • Input file: result/EpisCountAroundWT.tsv
    • Output file:
      • graph/WSN_EpisCount.png
      • graph/HK68_EpisCount.png
  • script/HARBS_compare.R: Compare RF index of a given variant in WSN and HK68 backgrounds

    • Input file: result/MutCompareTable.tsv
    • Output file: graph/MutFitCompare.png
  • script/EpiMap.R: Plot the pairwise position of reciprocal sign epistasis

    • Input file:
      • result/WSN_EpisMap.tsv
      • result/HK68_EpisMap.tsv
    • Output file:
      • graph/WSN_EpiMap.png
      • graph/HK68_EpiMap.png