The set of scripts included in this repository use the dataset of siRNA created by Heusken et. al (Heusken, Nature Biotechnology, 2005). The goal is to develop, using this dataset, a regression model that will predict the efficacy of a given siRNA molecule for RNA interference and a classification model that will predict whether a given siRNA molecule will have the desired potency for RNA interference or not.
The dataset comprises of a set of guide siRNA sequences from mouse and human and their measured activity against targetted mRNA sequences
The features were calculated using the calc_features_v2 script.