TnTWoW/econvRBP
RNA binding proteins (RBPs) determine RNA process from synthesis to decay, which play a significant role in RNA transport, translation and degradation. Therefore, exploring RBPs function from the amino acid sequence using computational methods have become one of the momentous topics in genome annotation. However, no successful works have been achieved yet since follow: (1) shallow feature: the sequence-determining structure is self-evident, but it is difficult to analyze the essential features from simple sequence. (2) Poorly understand: feature-based prediction methods mainly emphasize feature extraction, while in-depth understanding of protein mysteries limits the application of feature engineering. (3) Feature fusion: multi-feature fusion is often used in the prediction of RBPs, but the features are not well integrated. In view of these challenges, we propose a novel ensemble convolutional neural network (econvRBP) to predict RBPs. Meanwhile, we also provide a web server to verify other RBPs for biologists in this field.In order to capture the local and global features of RNA binding proteins simultaneously, first of all, Conjoint Triad and One Hot encoding methods are used to transform amino acid sequence into local and global features, respectively. After that the local and global features are combined with an ensemble method for further high-level feature extraction using convolutional neural networks. Some experiments were constructed to evaluate our method with 10-fold cross validation and the results show that it has achieved the best performance among all the predictors so far. We correctly predicted 97\% of 2875 RBPs and 99\% of 6872 non-RBPs with accuracy of 0.99. Matthew correlation coefficient (MCC) of 0.99, precision of 0.99, and the area under the curve (AUC) of 0.99. In addition, the training sets and testing sets provided by RBPPred are used to validate our models. The homologous sequences of the training set are removed with a threshold of 25\%. Achieving an accuracy of 0.87 at econvRBP simultaneously on the processed training set and testing set. These results indicate that the econvRBP is the most excellent method at present, and will provide reliable guidance for the detection of RBPs.
CSS
No issues in this repository yet.