- R>=3.5
- R packages: RandomForest,glmnet,e1071 and kernlab
git clone git@github.com:kaixuanDeng95/computational-CRISPR-strategy.git
Firstly,the Python script kmer.py should be used to get a feature file with a FASTA format input file.
Then the trained RF model that is stored in the Rdata file of "RF.model.Rdata" can be used to predict the Z_scores of the query sequences.
A FASTA file with two DNA sequences is used for demonstrations.The FASTA file "example.fasta" is tranformed to the feature file of "example_7mer.txt" by the python script of "kmer.py". And then the feature file can be put into the trained RF model to obtain their predicted Z_scores.
>chr6:36634989-36635089
TCTGGCACCCTGCAAGGCCGCATGATGATGCAACAATGCAACAAAAGACAAGCCCGGGCAAGGCCAGCGGGAGCTCTGCCGGCCAGAGTTGCTGATGCGA
>chr6:36635104-36635204
TGGGGAGGGTGTTTCAGGGCTGCAGGGAAGTGGGAGGCCCCAACTGCCCAGGAGGCAAAACTGGCCTCCTGCTCACTCAGCCATGAGCTTTTCTACCCCA
The feature file is a text file with 2 rows and 16384 columns.
library(randomForest)
x=as.matrix(read.table("example_7mer.txt"))
load("RF.model.Rdata")
y_pred=predict(RF.model,x)
> y_pred
4.7719043 0.1287047