Code for processing and analysis of RedaCT-Seq data. This protocol identifies RNA modifications (specifically, ac4C) by analyzing basecalling changes following reduction of ac4C with Sodium Borohydride (NaBH4).
For more detail, please see the corresponding publication:
Sturgill D, Arango D, Oberdoerffer S.
Protocol for base resolution mapping of ac4C using RedaC:T-seq.
STAR Protoc. 2022 Dec 16;3(4):101858. doi: 10.1016/j.xpro.2022.101858.
PMID: 36595942; PMCID: PMC9676198.
The steps below will guide through the steps required to perform the analysis. It's recommended to perform this analysis in a cluster or high-performance computing environment
This is a list of components needed:
- samtools v.1.11
- cutadapt v.1.6
- STAR aligner v.2.7.5a
- mpileup2readcounts
- R v.4.1.1
- R-studio v.2022.02.3 Build 492
R packages required:
- data.table v.1.14.2
Matt Dowle and Arun Srinivasan (2021) https://CRAN.R-project.org/package=data.table - dplyr v.1.0.8
Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2022). dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr - Genomation v1.24.0
Akalin A, Franke V, Vlahovicek K, Mason CE, Schubeler D, (2014) genomation: a toolkit to summarize, annotate and visualize genomic intervals. Bioinformatics. doi: 10.1093/bioinformatics/btu775 https://bioconductor.org/packages/release/bioc/html/genomation.html - GenomicFeatures v1.44.2
Lawrence M, Huber W, Pages H, Aboyoun P, Carlson M, et al. (2013) Software for Computing and Annotating Genomic Ranges. PLoS Comput Biol 9(8): e1003118. https://bioconductor.org/packages/release/bioc/html/GenomicFeatures.html - rtracklayer v.1.52.1
M. Lawrence, R. Gentleman, V. Carey: "rtracklayer: an {R} package for interfacing with genome browsers". Bioinformatics 25:1841-1842. https://bioconductor.org/packages/release/bioc/html/rtracklayer.html - stringr v.1.4.0
Hadley Wickham (2019). stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr
The repository below includes sample pileup data, corresponding to the input of the parsing step #2 in the Workflow section. Step #1 of the workflow shows how this sample data is derived. In this datafile, results for each sample are reported in columns for each reference location ('chr' and 'loc') columns. The reference base refers to the forward strand. Total coverage is given in the 'depth' column, and each base call is given in ACGT/acgt, where the case represents the stand the read mapped to. Note that this differs from a determination of the transcribed strand.
https://figshare.com/s/d7ab88c65c69ec0e097d
- Perform the pileup
To reduce file size, you could require a minimum depth of 10 in each sample, for example:
samtools mpileup -A -R -Q20 -C0 -d 100000 --ff UNMAP,SECONDARY,QCFAIL,DUP -f /data/indexes/STAR/hg19_UCSC/ref.fa WT.BH4.chr19.bam KO.BH4.chr19.bam WT.Ctrl.chr19.bam | sed 's/ / * */g' | mpileup2readcounts 0 -5 true 0 0 > mpileup_output/mpileup_output.txt ;
samtools mpileup -A -R -Q20 -C0 -d 100000 --ff UNMAP,SECONDARY,QCFAIL,DUP -f /data/indexes/STAR/hg19_UCSC/ref.fa WT.BH4.chr19.bam KO.BH4.chr19.bam WT.Ctrl.chr19.bam | sed 's/ / * */g' | mpileup2readcounts 0 -5 true 0 0 | awk '$4 >= 10 && $15 >= 10 && $26 >= 10' > mpileup_output/mpileup_output_chr19_min10.txt ;
- Parse the pileup results
scripts/redact_parse_script.pl sampledata/mpileup_output_chr19_min10.txt 3 > sampledata/mpileup_output_chr19_min10_parsed.txt ;
- Open results in R: Downstream analysis is performed in R (https://www.r-project.org/). An example analysis is provided in this Markdown file
Dave Sturgill - dave.sturgill@gmail.com or Daniel Arango - dany33co@gmail.com