This is the implementation of the paper entitled "Rationalizing Translation Elongation by Reinforcement Learning" Here we provide servaral functions as follows:
- Reproduce some results in our paper (cross-validation performance)
- Train a neural network model to predict ribosome densities across a set of transcripts
- Obtain the predicted ribosome densities for a particular gene.
- Train a neural network model by a custom dataset
Riboexp is written in Python version 2.7, and requires the following libraries:
cuda 8.0
python 2.7
pytorch 0.3.1
numpy 1.14.3
scipy 1.0.1
pip install -r requirement.txt
Command:
python runRiboexp.py --data_path ../data/yeast --mode 1 --fold {foldnumber} --load {filename}
Users need to choose the foldnumber (0/1/2) and the path of the trained model. For example,
python runRiboexp.py --data_path ../data/yeast/ --mode 1 --load ../data/trained_models/model_yeast_useStructure_True_fold_0_nhids_512_drate_0.4_mark_mark_lam_0.0083-Jul28.11h52/best-model.pkl --fold 0
The results are quickly obtained, which is consistent with our computational test.
Command:
python runRiboexp.py --data_path data/yeast --mode 2 --load0 {trainedmodel of fold 0} --load1 {trained model of fold1} --load2 {trainedmodel of fold2}
Example:
python runRiboexp.py --data_path ../data/yeast --mode 2 --load0 ../data/trained_models/model_yeast_useStructure_True_fold_0_nhids_512_drate_0.4_mark_mark_lam_0.0083-Jul28.11h52/best-model.pkl --load1 ../data/trained_models/model_yeast_useStructure_True_fold_1_nhids_512_drate_0.4_mark_mark_lam_0.0083-Jul28.13h32/best-model.pkl --load2 ../data/trained_models/model_yeast_useStructure_True_fold_2_nhids_512_drate_0.4_mark_mark_lam_0.0083-Jul28.15h03/best-model.pkl
Command:
python runRiboexp.py --data_path ../data/yeast --fold {foldnumber}
The training procedure will take about 1-2 hours, various from different GPU devices. This tool will save the best parameters of Riboexp accroding the performance on the held-out data (validation data).
Command:
python runRiboexp.py --data_path ../data/yeast
When users do not set the fold number, the algorithm will train three models for the three-fold cross-validation and save them respectively.
Users are able to use --no_structure
to train/analyze the data where the structure information is unavailable.
Command:
python runRiboexp.py --data_path ../data/yeast --no_structure
Note: This function only support the model that is trained without rna structure features. Command and example:
python runRiboexp.py --data_path ../data/yeast --mode 3 --load {trainedmodel} --fold 0 --no_structure --gene_path {genefile}
Example:
python runRiboexp.py --data_path ../data/yeast --mode 3 --fold 0 --no_structure --gene_path ../data/gene.test.txt --load ../data/trained_models/model_yeast_useStructure_False_nhids_512_drate_0.4_mark_mark_lam_0.0083-Jul28.09h34/best-model.pkl
- Create a new folder in ./data/, e.g. human.
- Divide your data into three fold, each of them containing three file that stores training, validation, and testing data, respectively.
- Put the name, codons, and corresponding footprint counts of genes on different lines in the file. Please refer to ./data/fold0/ for more details
- Place the three-fold data in the human folder, named "fold0", "fold1", and "fold2", respectively.
python runRiboexp.py --data_path {your dataset} --no_structure
If you have any questions, please feel free to contact me :) Email: liuxg16@mails.tsinghua.edu.cn