/IAV-CNN

A 2D convolutional neural network model to predict antigenic variants of influenza A virus through sequence data

Primary LanguagePython

IAV-CNN: a 2D convolutional neural network model to predict antigenic variants of influenza A virus through sequence data

Motivation:The rapid evolution of influenza viruses constantly leads to the emergence of novel influenza strains that are capable of escaping from population immunity. The timely determination of antigenic variants is critical to vaccine design. Empirical experimental methods like hemagglutination inhibition (HI) assays are time-consuming and labor-intensive, requiring live viruses. Recently, many computational models have been developed to predict the antigenic variants without considerations of explicitly modeling the interdependencies between the channels of feature maps. Moreover, the influenza sequences consisting of similar distribution of residues will have high degrees of similarity and will affect the prediction outcome. Consequently, it is challenging but vital to determine the importance of different residue sites and enhance the predictive performance of influenza antigenicity.

Results: We have proposed a 2D convolutional neural network (CNN) model to infer influenza antigenic variants (IVA-CNN). Specifically, we introduce a new distributed representation of amino acids, named ProtVec that can be applied to a variety of downstream proteomic machine learning tasks. After splittings and embeddings of influenza strains, a 2D squeeze-and-excitation CNN architecture is constructed that enables networks to focus on informative residue features by fusing both spatial and channel-wise information with local receptive fields at each layer. Experimental results on three influenza datasets show IVA-CNN achieves state-of-the-art performance combing the new distributed representation with our proposed architecture. It outperforms both traditional machine algorithms with the same feature representations and the majority of existing models in the independent test data.Therefore we believe that our model can be served as a reliable and robust tool for the prediction of antigenic variants.