Predicting Antimalarial Activity in Natural Products using Pre-trainded BERT

T-H Nguyen-Vo, Q. Trinh, L. Nguyen, T. T. T. Do, M. C. H. Chua*, B. P. Nguyen*

alt text

Motivation

Malaria is one of the most dangerous diseases leading to thousands of deaths and millions of infected cases annually. For years, many studies have been conducted to discover potent antimalarial compounds to treat this disease. Along with chemically synthesized compounds, natural products are also demonstrated to have strong antimalarial activities. To investigate antimalarial activity in natural products, besides experimental approaches, computational methods have been developed with satisfactory outcomes obtained. In our study, we construct various prediction models to identify antimalarial natural products using pre-trained Bidirectional Encoder Representations from Transformers (so-called NPBERT) incorporated with four machine learning algorithms, including k-Nearest Neighbours (k-NN), Support Vector Machines (SVM), eXtreme Gradient Boosting (XGB), and Random Forest (RF).

Results

The results show that SVM models are the best-performed classifiers, followed by the XGB, k-NN, and RF models. Additionally, comparative analysis between our proposed molecular encoding schemes and existing state-of-the-arts indicates that NPBERT work more effectively compared to the others. Moreover, the employment of Transformers in constructing molecular encoders is not limited to this study but can be expanded to address numerous biochemical issues.

Availability and Implementation

Source code and data are available on GitHub

Citation

Nguyen-Vo, T. H., Trinh, Q. H., Nguyen, L., Do, T. T., Chua, M. C. H., & Nguyen, B. P. (2021). Predicting Antimalarial Activity in Natural Products Using Pretrained Bidirectional Encoder Representations from Transformers. Journal of Chemical Information and Modeling. DOI: 10.1021/acs.jcim.1c00584

Contact

Go to contact information