/nanotail

R package for visualization and exploratory analysis of Oxford Nanopore direct RNA seq based polyA predictions

Primary LanguageRGNU General Public License v3.0GPL-3.0

nanotail

Build Status codecov DOI Licence [GitHub release

The goal of NanoTail is to provide a set of functions to manipulate and analyze data coming from polyA lengths estimations done using Oxford Nanopore Direct RNA sequencing and Nanopolish software. Existing solutions, like Pipeline for testing shifts in poly(A) tail lengths estimated by nanopolish are, in our opinion, not sufficient for in-depth analysis of such data. The software is still in the development phase so all suggestions are welcome. Please also expect the code to be changed frequently, so use it with caution.

Installation

Prerequisities

As asserrtive package was removed from CRAN it is currently impossible to install nanotail without prior manual installation of assertive package. We are working on exchanging the assertive package with its equivalent. However, for now, it is required to install all assertive packages manually:

install.packages("devtools")
devtools::install_bitbucket("richierocks/assertive.properties") # install this one first as other packages depend on it
devtools::install_bitbucket(c("richierocks/assertive.files", "richierocks/assertive.strings", "richierocks/assertive.numbers", "richierocks/assertive.matrices", "richierocks/assertive.sets", "richierocks/assertive.strings", "richierocks/assertive.models", "richierocks/assertive.reflection", "richierocks/assertive.types", "richierocks/assertive.datetimes", "richierocks/assertive.data", "richierocks/assertive.data.uk", "richierocks/assertive.data.us", "richierocks/assertive.code"))
devtools::install_bitbucket("richierocks/assertive.properties") # install this one first as other packages depend on it
devtools::install_bitbucket("richierocks/assertive")

Now you can install the developmental version of Nanotail with

devtools::install_github('smaegol/nanotail')
library(nanotail)

Input data

NanoTail needs output from nanopolish polya to work. It can read a single output file with read_polya_single:

path <- "/location/of/nanopolish/output"
polya_data <- read_polya_single(path)

It can also read multiple samples at once and associate any metadata with them. Let's assume we have performed an experiment, targeting one of the polyA polymerases. 2 replicates were sequenced for control samples, and 2 replicates sequenced for samples with mutant PAP, therefore after all analysis we have 4 files with nanopolish polya output. To read all of them at once, we can use command read_polya_multiple and associate metadata using samples_table data.frame:

samples_table <- data.frame(polya_path = c(path1,path2,path3,path4),
                            sample_name =c("wt1","mu1","wt2","mut2"),
                            group = c("wt","mut","wt","mut"))
polya_data_multiple <- read_polya_multiple(samples_table)

To obtain nanopolish predictions one can use Pipeline for calling poly(A) tail lengths from nanopore direct RNA data using nanopolish

Shiny App

Once data are imported they can be processed in the R environment using NanoTail functions described below or, more convenient, the interactive Shiny app can be launched, allowing for easy exploration of obtained data. To launch the app for the data imported above:

nanoTailApp(polya_table = polya_data_multiple)

Nanopolish output QC

To get overall information about the output of NanoPolish polya analysis, please use get_nanopolish_processing_info() function. Obtained summary can be plotted using plot_nanopolish_qc(). Summary of the analysis is also shown in the QC info tab of the Shiny App.

Nanopolish polya QC info shown in the Shiny App

Global distribution of polyA lengths

Global distribution of polyA tails lengths can be plotted with plot_polya_distribution() function, which produces the density plot, allowing for comparison of the distribution of polyA lengths between samples. The same plot can be seen in the Global polya distribution tab of the Shiny App.

Example global distribution density plot

Statistical analysis of polyA predictions

NanoTail is intended to analyze differential adenylation. For this purpose 3 statistical tests can be employed, allowing or comparison of polyA lengths of individual transcripts between selected conditions:

Differential adenylation analysis can be performed with the calculate_polya_stats function, or within the Differential adenylation tab in the Shiny App.

Differential adenylation tab

Differential expression analysis

NanoTail provides also the possibility of very basic differential expression testing, using binomTest from the edgeR package. This functionality is still in the development and may not work as expected. To calculate differential expression please use calculate_diff_exp_binom() function or use Shiny App.

Differential adenylation tab

Citation

Please cite NanoTail as: Krawczyk PS et al., NanoTail - R package for exploratory analysis of Nanopore Direct RNA based polyA lengths estimations

Preprint in the preparation.

TBD & plans

  • Import of polya predictions from software other then NanoPolish (poreplex,tailfindr,?)
  • Analysis of predictions based on genome-mapping (now only transcriptome-mapping is supported)
  • Annotation of results and enrichment analysis
  • Squiggle visualization of polyA tails

Support

Any issues connected with the NanoTail should be addressed to Pawel Krawczyk (p.krawczyk (at) ibb.waw.pl).