This is the official implementation of Virtual Screening Assistant Network (ViscaNet). The motivation of various modules in this repository is from Analyzing Learned Molecular Representations for Property Prediction and A self‐attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility. The basecode was obtained from chemprop.
This is common for all the experiments.
- Install Anaconda.
- Download the github repository including “viscanet.yml”.
- Open the terminal in your system and go to the location where viscanet.yml is downloaded. Then, run
conda env create -f environment.yml
. For more details, check the conda documentation. - Once the environment is created, enter the environment using the command
conda activate viscanet
.
The following steps can be used for any experiment where data is not split into multiple files.
- Add the data files to
./data/
directory. - Run all the cells of preprocess_fda.ipynb.
- Run
python train_fda.py --data_path (path to data file) --dataset_type classification --smiles_column s_sd_SMILES --target_columns r_i_docking_score --epochs 30 --num_folds 1 --features_path (path to features) --attention --separate_test_path ./data/fda.csv --separate_test_features_path ./data/fda.npy
- Sample data_path =
./data/fda.csv
- Sample features_path =
./data/fda.npy
- If you want to split one data file into train/val/test, do not use
--separate_test_path
and--separate_test_features_path
- If you have a separate test or val file, use
--separate_val_path
and--separate_val_features_path
, and--separate_test_path
and--separate_test_features_path
- If you need attention images, use
--viz_dir
and give the path to the location where you want to store those images.
- Sample data_path =
- This will give the test scores and create TruePositives.csv and FalsePositives.csv in the
./inference/
directory. - Run all the cells of get_fdaid.ipynb. This will add the drugbank_ID to the above mentioned generated files.
The following steps can be used for any experiment where data is split into multiple files.
- Add the data files to
./data/
directory. Add them in./data/nsp1_supernaturaldb_sift_data/
. Also, create directoriesnew_data
,new_data_feats
, andnew_data_norm_feats
inside thedata
directory. - Run
python preprocess_new_data.py
to preprocess the supernatural data. The processed data will store in the./data/new_data/
directory. - Use
python feature.py
to generate the feature (.npy) files for train as well as test data. These will be stored in./data/new_data_feats/
directory. - Generate normalized features for both train and test data by executing
python feature_normalize.py
. These will be stored in./data/new_data_norm_feats/
directory. - Split the data and the obtained normalized features into train, val, and test in directories
./data/new_data/
and./data/new_data_norm_feats/
respectively.
- Run
python os_train.py
. This will train the model on all the datafiles available in/data/new_data/train/
directory. - The model will be stored in
./model_checkpoints/
directory which is created automatically. Every time you run point number 1, it will run the new epoch for model training. For example, if you runpython os_train.py
5 times in sequence, it will mean that the model is now trained for 5 epochs.
- To test the model, you can run
python os_test.py --target_columns r_i_docking_score --dataset_type classification --epochs 1 --num_folds 1 --no_features_scaling --data_path ./
.- The paths are hard-coded inside the os_test.py and therefore,
--data_path
can be anything. - If you need attention images, use
--viz_dir
and give the path to the location where you want to store those images.
- The paths are hard-coded inside the os_test.py and therefore,
NOTE - Kindly fix the paths in case it throws an error or feel free to contact me or raise an issue.