/HA_Abs

The repository includes all custom scripts and deep learning model code associated with the paper titled "An Explainable Language Model for Antibody Specificity Prediction Using Curated Influenza Hemagglutinin Antibodies.

Primary LanguagePython

DOI

Sequence analysis of influenza hemagglutinin (HA) antibodies

This README describes the analysis in:
An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies

Contents

Env setup

if you set up env using conda, run conda installation as follow:

conda env create -f Ab_epitope/environment.yml

Dataset

CDR H3 analysis

  1. Extract CDR H3 sequences and references
    python3 script/parse_Ab_table.py

  2. Clustering CDR H3 sequences
    python3 script/CDRH3_clustering_optimal.py

  3. Analyzing CDR H3 clustering results
    python3 script/analyze_CDRH3_cluster.py

  4. Analyzing CDR H3 property
    python3 script/analyze_CDRH3_property.py

  5. Create sequence logos for different CDR H3 clusters
    python3 script/CDRH3_seqlogo.py

  6. Plot CDR H3 property for HA head and stem antibodies
    Rscript script/plot_CDRH3_property.R

Germline usage analysis

  1. Clonotype assignment
    python3 script/assign_clonotype.py

  2. Compute germline usag and extract public clonotype
    python3 script/extract_public_clonotype_VDJ.py

  3. Extract IGHD4-17-encoded head antibodies
    python3 script/analyze_IGHD4-17.py

  4. Analyzing the occurrence of YGD motif in CDR H3
    python3 script/analyze_YGD_motif.py

  5. Plot VDJ gene usage
    Rscript script/plot_VDJgene_freq.R

  6. Plot IGHV/IGK(L)V pairing frequency
    Rscript script/plot_Vpair_heatmap.R

  7. Plot frequency of YGD motif
    Rscript script/plot_YGD_freq.R

mBLM for specificity prediction

See Ab_epitope