/Drug_Discovery_Using_ML

A simple data science project that deals with Drug discovery and a small web-app demonstrating the deployed Machine Learning model.

Primary LanguageJupyter Notebook

Drug_Discovery_Using_ML - Basic Overview of the project

The ChEMBL Database is a database that contains curated bioactivity data of more than 2 million compounds. It is compiled from more than 76,000 documents, 1.2 million assays and the data spans 13,000 targets and 1,800 cells and 33,000 indications. This is pre-processed biological activity data from the ChEMBL database that you can use to perform Computational Drug Discovery. The dataset is comprised of compounds (molecules) that have been biologically tested for their activity towards target organism/protein of interest. Later, we use the SMILES notation (representing the unique chemical structure of compounds) to compute molecular descriptors. Finally, we will then perform exploratory data analysis by making simple box plots and scatter plots to discern differences of the active and inactive sets of compounds. We will then compute the molecular descriptors using the PADEL-Descriptor software and prepare the dataset (X and Y data frames) that will be used in the next part for Model Building. Proceeding it, I will show you how to use the computed molecular descriptors (as the X variables) to build a regression model for predicting the pIC50 values (the Y variable). Finally, we quickly build and compare several regression models (quantitative structure-activity relationship or QSAR) of the Acetylcholinesterase inhibitors using the lazy predict library in Python. I will show how to deploy the machine learning model as a web app. Essentially, this web app will serve as a Bioinformatics tool that will allow users the ability to predict whether a compound of interest has favourable biological activity against the target protein or not.

The application runs as follows :

The required ZIP files are in the repository and can be downloaded as per the needs in the Jupyter notebooks.

Generating the PKL file

The machine learning model used in this web app will firstly have to be generated by successfully running the included Jupyter notebook bioactivity_prediction_app.ipynb. Upon successfully running all code cells, a pickled model called acetylcholinesterase_model.pkl will be generated.

Launching the app

streamlit run drug.py