Machine-learning Analysis of Opioid Use Disorder Informed by MOR, DOR, KOR, NOR and ZOR-Based Interactome Networks
This script is for the paper "Machine-learning Analysis of Opioid Use Disorder Informed by MOR, DOR, KOR, NOR and ZOR-Based Interactome Networks". With the script, machine-learning regression model based on natural language processing (NLP) method can be built.
OS Requirements
- CentOS Linux 7 (Core)
Python Dependencies
- setuptools (>=18.0)
- python (>=3.7)
- pytorch (>=1.2)
- rdkit (2020.03)
- biopandas (0.2.7)
- numpy (1.17.4)
- scikit-learn (0.23.2)
- scipy (1.5.2)
- pandas (0.25.3)
- cython (0.29.17)
Download the repository from Github
# download repository by git
git clone https://github.com/WeilabMSU/OUD_PPI.git
The feature generation follows the work "Extracting Predictive Representations from Hundreds of Millions of Molecules" by Dong Chen, Jiaxin Zheng, Guo-Wei Wei, and Feng Pan." The pretrained model in their work was built based on transformer NPL techniques and is utilized to generate molecular features here.
Download and install the pretrained model under the downloaded OUD_PPI folder.
cd OUD_PPI
bash install-transformer.sh
The input for our feature generation model is *.smi file, which stores molecules of SMILES format. The command below can be used to generate transformer-based molecular fingerprints. An example *.smi file is given as MOR.smi
cd OUD_PPI
python fp-generation.py --path-to-smi MOR.smi
The generated features are saved in the folder "features".
Below is the script used to build gradient boosting decision tree (model) machine-learning model using the generated transformer-based molecular fingerpints. An example feature file and label file are given as MOR.npy and MOR.csv. The generated machine-learning model is save in the "path-models" folder.
cd OUD_PPI
python build-GBDT-regression.py --feature_path features/MOR.npy --label_path MOR.csv --save_model_name MOR
-
Hongsong Feng, Rana Elladki, Jian Jiang, and Guo-Wei Wei, Machine-learning Analysis of Opioid Use Disorder Informed by MOR, DOR, KOR, NOR and ZOR-Based Interactome Networks, Computers in Biology and Medicine (2023).
-
Dong Chen, Jiaxin Zheng, Guo-Wei Wei, and Feng Pan. Extracting predictive representations from hundreds of millions of molecules. The Journal of Physical Chemistry Letters, 12(44):10793–10801, 2021.
All codes released in this study is under the MIT License.