/RedaCT-Seq

Reduction of ac4C and NaBH4 sequencing

Primary LanguagePerl

RedaC:T-seq analysis example code

NaBH4-induced Reduction of ac4C and conversion to Thymidine followed by sequencing

Code for processing and analysis of RedaCT-Seq data. This protocol identifies RNA modifications (specifically, ac4C) by analyzing basecalling changes following reduction of ac4C with Sodium Borohydride (NaBH4).

For more detail, please see the corresponding publication:

Sturgill D, Arango D, Oberdoerffer S.
Protocol for base resolution mapping of ac4C using RedaC:T-seq.
STAR Protoc. 2022 Dec 16;3(4):101858. doi: 10.1016/j.xpro.2022.101858.
PMID: 36595942; PMCID: PMC9676198.

Getting Started

The steps below will guide through the steps required to perform the analysis. It's recommended to perform this analysis in a cluster or high-performance computing environment

Prerequisites

This is a list of components needed:

R packages required:

Retrieve the data

The repository below includes sample pileup data, corresponding to the input of the parsing step #2 in the Workflow section. Step #1 of the workflow shows how this sample data is derived. In this datafile, results for each sample are reported in columns for each reference location ('chr' and 'loc') columns. The reference base refers to the forward strand. Total coverage is given in the 'depth' column, and each base call is given in ACGT/acgt, where the case represents the stand the read mapped to. Note that this differs from a determination of the transcribed strand.

https://figshare.com/s/d7ab88c65c69ec0e097d

Workflow

  1. Perform the pileup
     samtools mpileup -A -R -Q20 -C0 -d 100000 --ff UNMAP,SECONDARY,QCFAIL,DUP -f /data/indexes/STAR/hg19_UCSC/ref.fa WT.BH4.chr19.bam KO.BH4.chr19.bam WT.Ctrl.chr19.bam | sed 's/        /    *     */g' | mpileup2readcounts 0 -5 true 0 0 > mpileup_output/mpileup_output.txt ;
    To reduce file size, you could require a minimum depth of 10 in each sample, for example:
     samtools mpileup -A -R -Q20 -C0 -d 100000 --ff UNMAP,SECONDARY,QCFAIL,DUP -f /data/indexes/STAR/hg19_UCSC/ref.fa WT.BH4.chr19.bam KO.BH4.chr19.bam WT.Ctrl.chr19.bam | sed 's/        /    *     */g' | mpileup2readcounts 0 -5 true 0 0 | awk '$4 >= 10 && $15 >= 10 && $26 >= 10' > mpileup_output/mpileup_output_chr19_min10.txt ;
  2. Parse the pileup results
  scripts/redact_parse_script.pl sampledata/mpileup_output_chr19_min10.txt 3 > sampledata/mpileup_output_chr19_min10_parsed.txt ;
  1. Open results in R: Downstream analysis is performed in R (https://www.r-project.org/). An example analysis is provided in this Markdown file

Contact

Dave Sturgill - dave.sturgill@gmail.com or Daniel Arango - dany33co@gmail.com