by Sumit Tarafder and Debswapna Bhattacharya
Codebase for our locality-aware invariant Point Attention-based RNA ScorEr (lociPARSE).
pip install lociPARSE
Or
git clone https://github.com/Bhattacharya-Lab/lociPARSE.git
cd lociPARSE
pip install .
Typical installation time should take less than a minute in a 64-bit Linux system.
Instructions for running lociPARSE:
from lociPARSE import lociparse
lp = lociparse()
score = lp.score("R1108.pdb")
Additional functionality
score.pMoL.show() #Returns pMoL value
score.pNuL.show() #Returns a list of pNuL values
score.pNuL.show(1) #Returns the pNuL value of 1st nucleotide
score.save("score.txt") #Saves the scores
-
Given an RNA pdb "R1108.pdb" as input, lociPARSE predicts both molecular-level lDDT (pMoL) and nucleotide-wise lDDT (pNuL) score.
-
Use show() function to print the pMoL or pNuL values.
-
Save the output in a provided filename of your choice("score.txt"). First line shows pMoL score. Each of the subsequent lines sepcify 2 columns: column-1: nucleotide index in PDB and column-2: predicted nucleotide-wise lDDT (pNuL) score.
Inference time for a typical RNA structure (~70 nucleotides) should take a few seconds.
- The lists of IDs used in our training set, test sets and validation set used in ablation study are available here.
- Training set and test set of 30 independent RNAs were taken from trRosettaRNA.
- CASP15 experimental strctures and all submiited predictions were downloaded from CASP15.
- The set of 60 non-redundant RNA targets TS60 for hyperparameter optimization was in-house curated. See (https://doi.org/10.1093/biomethods/bpae047) for more details.
If you want to train or evaluate lociPARSE, please follow these initial steps:
-
Download the necessary materials from here and place it in the root directory(/lociPARSE)
wget https://zenodo.org/records/12729167/files/Materials.tar.gz
-
Extract the Material.tar.gz folder
tar -xvzf Materials.tar.gz --strip-components=1
-
Make sure if you have installed appropriate torch version compatible with the CUDA version installed in your machine for GPU training. See here for more (https://pytorch.org/get-started/locally/).
If you wish to train lociPARSE from scratch on our training set, please follow these steps:
-
Download our training dataset Train.tar.gz from here and place it inside Input/Dataset folder.
-
Extract the training dataset
tar -xzvf Train.tar.gz
-
Run the following command to train our architecture
chmod a+x lociPARSE_train.sh && ./lociPARSE_train.sh > log.txt
It will take approximately 16 hours to finish feature generation and 50 epochs of training on a single A100 gpu.
-
The best model saved on validation loss will be placed inside the Model folder as "QAmodel_retrained.pt".
If you want to generate our reported results in the paper from the provided predictions, follow these steps:
-
To generate Tables 1-6, please run the following commands one by one.
cd Evaluate python3 QA_eval.py Test30_CASP15 0 python3 QA_eval.py ARES_benchmark2 0
-
You will find the corresponding results inside Evaluate/Results folder.
-
To generate Supplementary Figures S1-S2, please run the following commands.
cd Evaluate python3 draw.py
-
Generated figures will be inside Evaluate/Figures folder.
If you want to predict the scores by lociPARSE from scratch and re-evaluate, follow these steps:
-
Download our test datasets Test.tar.gz and Ares_set.tar.gz from here and place it inside Input/Dataset folder.
-
Extract the folders
tar -xzvf Test.tar.gz
tar -xzvf Ares_set.tar.gz
-
To predict and evaluate results on our two test sets Test30 and CASP15 (Tables 1-5), please run the following command.
chmod a+x evaluate.sh && ./evaluate.sh Test30_CASP15 Model/QAmodel_lociPARSE.pt
-
To predict and evaluate results on ARES benchmark set-2 (Table 6), please run the following command. [This will be slow due to ~76k models in this test set]
chmod a+x evaluate.sh && ./evaluate.sh ARES_benchmark2 Model/QAmodel_Ares_set.pt
-
You will find the corresponding results inside Evaluate/Results folder.