/SPVec-SGCN-CPI

Primary LanguageJupyter Notebook

SPVec-SGCN-CPI

An end-to-end method for compound-protein interaction based on simplified homogeneous graph convolutional network and pre-trained language model

The identification of interactions among chemical compounds and proteins is crucial for various applications, including drug discovery, target identification, network pharmacology, and elucidation of protein functions. Deep neural network-based approaches are becoming increasingly vital in accurately, efficiently and rapidly identifying compound-protein interactions (CPIs) with high-throughput capabilities, supplement to replacing traditional labor-intensive, time-consuming and expensive experimental techniques. We present an innovative end-to-end approach termed SPVec-SGCN-CPI. This method utilized simplified GCN model with low-dimensional and continuous features generated from the SPVec model and graph topology information to predict compound-protein interactions. The SGCN technique, dividing the local neighborhood aggregation and nonlinearity layer-wise propagation steps, effectively aggregates K-order neighbor information while avoiding neighbor explosion and expediting training. The performance of the SPVec-SGCN-CPI method was assessed across three databases and compared against four Machine learning and deep learning methods, as well as four state-of-art methods. Classification results revealed that SPVec-SGCN-CPI outperformed all these methods, particularly excelling in unbalanced data scenarios.