/LION

An Integrated R Package for Effective Prediction of ncRNA- and lncRNA-protein Interaction

Primary LanguageRGNU General Public License v3.0GPL-3.0

LION

An integrated R package for effective prediction of lncRNA- and ncRNA-protein interaction

Understanding ncRNA-protein interaction is of critical importance to unveil ncRNAs' functions. Now many computational tools have been developed to facilitate the research on ncRNA-protein interaction. Nonetheless, the majority of these tools show unstable results and lack the flexibility required by dataset-specific prediction. Here we propose an integrated package LION which comprises a new method for predicting ncRNA/lncRNA-protein interaction as well as a comprehensive strategy to meet the requirement of customisable prediction. As an integrated tool for predicting ncRNA-protein interaction, LION can be used to build adaptable models for species and tissue-specific prediction and considerably enhance the performance of several widely-used tools. Experimental results also demonstrate our method outperforms its competitors on multiple benchmark datasets. We expect LION will be a powerful and efficient tool for the prediction and analysis of ncRNA- and lncRNA-protein interaction.

Thank you for checking our LION @BigCatZoo!

Any questions regarding LION please drop an email to the zookeeper Siyu Han (siyu.han@tum.de) or post it to issues.

Install LION

Using devtools

# Enter the following command in R:

if (!library("devtools", logical.return = T)) install.packages("devtools")
devtools::install_github("HAN-Siyu/LION")

Or Download Source Package Here and Install Manually.

Versions below v0.2.9.1 has a issue in calculating metrics. The issue did not affect the results reported in our paper. We recommend using the latest version. Update details can be found in NEWS.

Habitat

Almost all dependencies will be installed when installing LION in R. However, secondary strucutre features are computed using standalone software, RNAsubopt (from ViennaRNA package) and Predator. You need to download these two programmes if you would like to use method lncPro or extract structural features.

Supporting Files

[PDF Manual] [Datasets and Raw Results]

How to Train Your LION

We expect LION could be a powerful package for predicting RNA-protein interaction in a uniform R environment. The functions of LION can be categorized into several groups to facilitate feature extraction, interaction prediction and model tuning. We here provide a basic summary for LION's function. Detailed examples and parameters explanations can be found in our manual.

Functions for feature extraction

  • computeFreq(): compute k-mer frequencies of RNA/protein sequences. Support three amino acids reprentations, entripy density profile (EDP) computation and data normalization.
  • computeMLC(): compute the most-like coding region of RNA sequences. Support two strategies: longest open reading frame (ORF) and maximum subarray sum (MSS).
  • computeMotifs(): compute number of motifs in RNA/protein sequences. User-defined motifs are also supported.
  • computePhysChem(): compute physicochemical features of RNA/protein sequences.
  • computePhysChem_AAindex(): compute various physicochemical features of protein sequences using AAindex.
  • computeStructure(): computes the secondary structural features of RNA/protein sequences using ViennaRNA/Predator packages (the packages are required).

Functions for feature set construction

  • featureFreq(): calculate and construct feature set using k-mer frequencies.
  • featureMotifs(): calculate and construct feature set using motif patterns.
  • featurePhysChem(): calculate and construct feature set using physicochemical properties.
  • featureStructure(): calculate and construct feature set using the secondary structural information.

Functions for random forestion model training

  • randomForest_CV(): perform stratified k-cross-validation.
  • randomForest_RFE(): perform stratified feature selection using recursive feature elimination (RFE).
  • randomForest_tune(): tuning mtry of random forest model.

Functions for RNA-protein prediction with different methods

  • run_LION(): predict interaction or construct feature set or retrain models using LION method (this work).
  • run_LncADeep(): predict interaction (retrained random forest model) or construct feature set or retrain models using LncADeep method. If you would like to use original deep neural network-based model, please refer to the original repository.
  • run_lncPro(): predict interaction (support original algorithm and retrained random forest model) or construct feature set or retrain models using lncPro method. Original repository is not available when publishing this readme document.
  • run_rpiCOOL(): predict interaction (retrained random forest model) or construct feature set or retrain models using rpiCOOL method. Original repository is not available when publishing this readme document.
  • run_RPISeq(): predict interaction (support web-based original algorithm and retrained random forest model) or construct feature set or retrain models using RPISeq method.
  • run_confidentPrediction(): perform confident prediction by employing all available methods. Users can further calculate intersection/union or build new models with the output of this function.

Other Utilities

  • formatSeq(): generate sequences pairs for feature extraction or prediction.
  • evaluatePrediction(): compute metrics, including TP, TN, FP, FN, Sensitivity, Specificity, Accuracy, F1-Score, MCC (Matthews Correlation Coefficient) and Cohen’s Kappa, to evaluate prediction results.
  • runPredator(): call Predator to process protein sequences (Predator is required).
  • runRNAsubopt(): call RNAsubopt to process protein sequences (ViennaRNA package is required).

Cite This Work

To cite LION in publications, please use:

Siyu Han, Xiao Yang, Hang Sun, Yang Hu, Qi Zhang, Cheng Peng, Wensi Fang, Ying Li. LION: an integrated R package for effective prediction of ncRNA–protein interaction. Briefings in bioinformatics. 2022; 23(6):bbac420. (doi: https://doi.org/10.1093/bib/bbac420)

Our BigCatZoo:

  • LION (this work): an integrated R package for effective prediction of lncRNA/ncRNA–protein interaction
  • TIGER: technical variation elimination for metabolomics data using ensemble learning architecture
  • LEOPARD: missing view completion for multi-timepoint omics data via representation disentanglement and temporal knowledge transfer