An integrated R package for effective prediction of lncRNA- and ncRNA-protein interaction
Understanding ncRNA-protein interaction is of critical importance to unveil ncRNAs' functions. Now many computational tools have been developed to facilitate the research on ncRNA-protein interaction. Nonetheless, the majority of these tools show unstable results and lack the flexibility required by dataset-specific prediction. Here we propose an integrated package LION which comprises a new method for predicting ncRNA/lncRNA-protein interaction as well as a comprehensive strategy to meet the requirement of customisable prediction. As an integrated tool for predicting ncRNA-protein interaction, LION can be used to build adaptable models for species and tissue-specific prediction and considerably enhance the performance of several widely-used tools. Experimental results also demonstrate our method outperforms its competitors on multiple benchmark datasets. We expect LION will be a powerful and efficient tool for the prediction and analysis of ncRNA- and lncRNA-protein interaction.
Thank you for checking our LION @BigCatZoo!
Any questions regarding LION please drop an email to the zookeeper Siyu Han (siyu.han@tum.de) or post it to issues.
Using devtools
# Enter the following command in R:
if (!library("devtools", logical.return = T)) install.packages("devtools")
devtools::install_github("HAN-Siyu/LION")
Or Download Source Package Here and Install Manually.
Versions below v0.2.9.1 has a issue in calculating metrics. The issue did not affect the results reported in our paper. We recommend using the latest version. Update details can be found in NEWS.
Almost all dependencies will be installed when installing LION in R. However, secondary strucutre features are computed using standalone software, RNAsubopt (from ViennaRNA package) and Predator. You need to download these two programmes if you would like to use method lncPro or extract structural features.
- ViennaRNA Package: https://www.tbi.univie.ac.at/RNA/
- Predator: https://bioweb.pasteur.fr/packages/pack@predator@2.1.2
[PDF Manual] [Datasets and Raw Results]
We expect LION could be a powerful package for predicting RNA-protein interaction in a uniform R environment. The functions of LION can be categorized into several groups to facilitate feature extraction, interaction prediction and model tuning. We here provide a basic summary for LION's function. Detailed examples and parameters explanations can be found in our manual.
Functions for feature extraction
computeFreq()
: compute k-mer frequencies of RNA/protein sequences. Support three amino acids reprentations, entripy density profile (EDP) computation and data normalization.computeMLC()
: compute the most-like coding region of RNA sequences. Support two strategies: longest open reading frame (ORF) and maximum subarray sum (MSS).computeMotifs()
: compute number of motifs in RNA/protein sequences. User-defined motifs are also supported.computePhysChem()
: compute physicochemical features of RNA/protein sequences.computePhysChem_AAindex()
: compute various physicochemical features of protein sequences using AAindex.computeStructure()
: computes the secondary structural features of RNA/protein sequences using ViennaRNA/Predator packages (the packages are required).
Functions for feature set construction
featureFreq()
: calculate and construct feature set using k-mer frequencies.featureMotifs()
: calculate and construct feature set using motif patterns.featurePhysChem()
: calculate and construct feature set using physicochemical properties.featureStructure()
: calculate and construct feature set using the secondary structural information.
Functions for random forestion model training
randomForest_CV()
: perform stratified k-cross-validation.randomForest_RFE()
: perform stratified feature selection using recursive feature elimination (RFE).randomForest_tune()
: tuningmtry
of random forest model.
Functions for RNA-protein prediction with different methods
run_LION()
: predict interaction or construct feature set or retrain models using LION method (this work).run_LncADeep()
: predict interaction (retrained random forest model) or construct feature set or retrain models using LncADeep method. If you would like to use original deep neural network-based model, please refer to the original repository.run_lncPro()
: predict interaction (support original algorithm and retrained random forest model) or construct feature set or retrain models using lncPro method. Original repository is not available when publishing this readme document.run_rpiCOOL()
: predict interaction (retrained random forest model) or construct feature set or retrain models using rpiCOOL method. Original repository is not available when publishing this readme document.run_RPISeq()
: predict interaction (support web-based original algorithm and retrained random forest model) or construct feature set or retrain models using RPISeq method.run_confidentPrediction()
: perform confident prediction by employing all available methods. Users can further calculate intersection/union or build new models with the output of this function.
Other Utilities
formatSeq()
: generate sequences pairs for feature extraction or prediction.evaluatePrediction()
: compute metrics, including TP, TN, FP, FN, Sensitivity, Specificity, Accuracy, F1-Score, MCC (Matthews Correlation Coefficient) and Cohen’s Kappa, to evaluate prediction results.runPredator()
: call Predator to process protein sequences (Predator is required).runRNAsubopt()
: call RNAsubopt to process protein sequences (ViennaRNA package is required).
To cite LION in publications, please use:
Siyu Han, Xiao Yang, Hang Sun, Yang Hu, Qi Zhang, Cheng Peng, Wensi Fang, Ying Li. LION: an integrated R package for effective prediction of ncRNA–protein interaction. Briefings in bioinformatics. 2022; 23(6):bbac420. (doi: https://doi.org/10.1093/bib/bbac420)
- LION (this work): an integrated R package for effective prediction of lncRNA/ncRNA–protein interaction
- TIGER: technical variation elimination for metabolomics data using ensemble learning architecture
- LEOPARD: missing view completion for multi-timepoint omics data via representation disentanglement and temporal knowledge transfer