RADAR

RADAR (RNA-editing Analysis-pipeline to Decode All twelve-types of RNA-editing events) is devised to detect and visualize all possible twelve-types of RNA editing events from RNA-seq datasets.

Features

RADAR can be conveniently applied to identify RNA-editing from RNA-seq data with stringent filtering steps.

All possible RNA-editing events from each given RNA-seq dataset are summarized into an Excel file.
Numbers of all twelve-types of RNA editing events are plotted by histograms according to their genomic locations in Alu, repetitive non-Alu and non-repetitive regions.
Manhattan plots are further used to illustrate RNA editing ratios of selected types of RNA-editing events, such as C-to-U or A-to-G.

Schema

Installation requirements

RADAR can be run directly without setup process after downloaded and unzipped, only if tools it depends on have been installed:

HISAT2 (>=2.0.5)
BWA (>=0.7.9)
Samtools (>=1.7)
GATK (>=4.0.1.0)
Perl (>=5.18)
Python (>=2.7.8)
Dependent Python modules:
- pysam (>=0.12.0.1)
R (>=3.0.2)
Dependent R packages:
- ggplot2
- dplyr

Installation

As long as all Installation requirements have been fulfilled, RADAR can be run directly without setup process after downloaded and unzipped.

git clone https://github.com/YangLab/RADAR
cd RADAR
chmod +x RADAR
./RADAR -h

Configuration

Reference genome, genomic sequence index and genomic annotations should be provided to RADAR within the RADAR.conf file. Examples of how to get these annotations have been provided in src/tools_preparation_of_RADAR_conf_annotation.sh.

Reference and sequence index

Genome build version of the reference genome.
- Example: genome_build_version=hg38
Ribosomal DNA (rDNA) sequence index for BWA MEM, which can be created by command "bwa index ~/reference/Human/RNA_45S5/RNA45S5.fa"
- Example: rDNA_idnex_bwa_mem=~/reference/Human/RNA_45S5/RNA45S5.fa
Path to reference genome
- Example: genome_fasta=~/reference/Human/hg38/hg38_all.fa
Reference genome sequence index for HISAT2, which can be created by command "hisat2-build ~/reference/Human/hg38/hg38_all.fa ~/reference/Human/hg38/hg38_all.fa"
- Example: genome_index_hisat2=~/reference/Human/hg38/hg38_all.fa
Reference genome sequence index for BWA MEM, which can be created by command "bwa index ~/reference/Human/hg38/hg38_all.fa"
- Example: genome_index_bwa_mem=~/reference/Human/hg38/hg38_all.fa
Reference genome sequence index for Blat, which can be created by command "RADAR/tools/faToTwoBit ~/reference/Human/hg38/hg38_all.fa ~/reference/Human/hg38/hg38_all.fa.2bit"
- Example: genome_index_blat=~/reference/Human/hg38/hg38_all.fa.2bit
Reference genome sequence index for GATK in the same directory with reference genome, which can be created by command "gatk CreateSequenceDictionary -R ~/reference/Human/hg38/hg38_all.fa"
- Example: genome_index_gatk=~/reference/Human/hg38/hg38_all.dict

Variants annotation: dbSNP, 1000Genome, EVS

Total variants annotation from NCBI dbSNP and GATK index in the same directory.
- Example of the total .vcf file: dbSNP_all=~/annotation/Human/hg38/SNP/dbSNP_b151/NCBI_dbSNP_b151_all_hg38.vcf
- Example of the GATK index for total .vcf (which can be created by command "gatk IndexFeatureFile -F ~/annotation/Human/hg38/SNP/dbSNP_b151/NCBI_dbSNP_b151_all_hg38.vcf"): dbSNP_all_index_gatk=~/annotation/Human/hg38/SNP/dbSNP_b151/NCBI_dbSNP_b151_all_hg38.vcf.idx
SNP annotation from NCBI dbSNP divided by chromosome
- Example of the folder for NCBI dbSNP divided by chromosome: SNP_dbSNP_divided_by_chromosome=~/annotation/Human/hg38/SNP/dbSNP_b151/split_chr
SNP annotation from The 1000 Genomes Project divided by chromosome
- Example of the folder for SNP divided by chromosome: SNP_1000Genome_divided_by_chromosome=~/annotation/Human/hg38/SNP/1000genomes/split_chr
SNP annotation from The University of Washington Exome Sequencing Project divided by chromosome
- Example of the folder for SNP divided by chromosome: SNP_EVS_divided_by_chromosome=~/annotation/Human/hg38/SNP/EVS/split_chr

Genome annotation

Annotation of Alu, repetitive non-Alu, all repetitive genomic region in the BED format
- Example of Alu annotation: annotation_Alu=~/annotation/Human/hg38/Alu.bed
- Example of repetitive non-Alu annotation: annotation_Repetitive_non_Alu=~/annotation/Human/hg38/Repetitive_non-Alu.bed
- Example of all repetitive annotation: annotation_All_repetitive=~/annotation/Human/hg38/All_repetitive.bed
Annotation of RepeatMasker simple repeats from UCSC in BED format
- Example: annotation_simple_repeats=~/annotation/Human/hg38/UCSC_RepeatMask_SimpleRepeats_hg38.bed
Annotation of splice sites in BED format, which is used as the input of option "--known-splicesite-infile" during HISAT2 mapping. Official website of HISAT2 has detailed how to create it.
- Example: annotation_splice_sites=~/annotation/Human/hg38/ref_all_spsites_hg38.bed
Annotation of intronic 4 bp flanking splice sites
- Example: annotation_intronic_4site=~/annotation/Human/hg38/hg38_intronic_4site.bed
Annotation of transcribed strands of genes
- Example: annotation_gene_transcribed_strands=~/annotation/Human/hg38/ref_UCSC_refFlat.bed

All genome annotations are in BED format:

Field	Description
chrom	Chromosome
start	Start position
end	End position
name	Repeat name or gene symbol/gene name
score	Smith Waterman alignment score for repeat region
strand	+ or - for strand

Usage

RADAR pipeline can break down into three main steps, while read mapping and RNA-editing calling are integrated into one part during operation:

PART 1: Read mapping and RNA-editing calling

For paired-end RNA-seq data:
COMMAND: ./RADAR read_mapping_and_RNA_editing_calling -1 "full_path_of_fastq1" -2 "full_path_of_fastq2" --stranded "true/false" -n "outname" -o "output_dir" -t "maximum_threads"
For single-end RNA-seq data:
COMMAND: ./RADAR read_mapping_and_RNA_editing_calling -s "full_path_of_fastq" --stranded "true/false" -n "outname" -o "output_dir" -t "maximum_threads"

Options

-s | --single | -single: Fasta file for the single-end RNA-seq data.
-1 | --fq1 | -fq1 and -2 | --fq2 | -fq2: Fasta file for the paired-end RNA-seq data.
--rna-strandness | -rna-strandness: The strand-specific information used for HISAT2 mapping. For single-end reads, use F or R. For paired-end reads, use either FR or RF. Detailed descriptions of this option was available in HISAT2 manual.
-n | --outname | -outname: The prefix of file name for the RNA-editing results. Three result files will be created under the directory of outdir, including "outname_Alu.vcf", "outname_Repetitive_non_Alu.vcf", "outname_Non_repetitive.vcf".
-o | --outdir | -outdir: Output directory of the results.
-t | --thread | -thread: Maximum threads used for computation.
-h | --help | -help: Print help information.

PART 2: RNA-editing result summarization and visualization

1. RNA-editing result summarization

COMMAND: RADAR RNA_editing_summarization -i "outdir_of_PART1" -o "file_of_result"

Options

2. Histogram plot for each treatment

COMMAND: ./RADAR histogram -i "outdir_of_PART1" -n "outnames_of_replicates_from_PART1" -o "file_of_plot"

Options

-i | --inputdir | -inputdir: The directory of the PART1 read mapping and RNA-editing results.
-n | --outname_of_replicates | -outname_of_replicates: The outnames of PART1 RNA-editing results for multiple replicates from the same treatment. The separator between outnames should be comma, for example, "s1_rep1,s1_rep2,s1_rep3".
-o | --output | -output: Full path of the pdf file for the histogram.
-h | --help | -help: Print help information.

3. Manhattan plot of specific RNA-editing type

COMMAND: ./RADAR Manhattan_plot -i "outdir_of_PART1" --RNA_editing_type "RNA_editing_type" -n "outname_of_samples_from_PART1" -c "colors_of_samples_in_the_plot" -o "file_of_plot"

Options

-i | --inputdir | -inputdir: The directory of the PART1 read mapping and RNA-editing results.
--RNA_editing_type | -RNA_editing_type: Interested RNA-editing type for the Manhattan plot, which was selected from all twelve-types RNA-editing, including A-to-C, A-to-G, A-to-U, C-to-A, C-to-G, C-to-U, G-to-A, G-to-C, G-to-U, U-to-A, U-to-C, U-to-G.
-n | --outname_of_samples | -outname_of_samples: Outnames of samples from PART1 RNA-editing results. The separator between outnames should be comma, for example, "s1_rep1,s1_rep2,s1_rep3,s2_rep1,s2_rep2,s2_rep3".
-c | --color_of_samples | -color_of_samples: Color of hex RGB format for the dot of samples in the plot. Colors should be within double quotations and seperated by comma (,). For example, "#919191,#919191,#FF3F00,#FF3F00,#FF3F00". For multiple samples, provide matched colors and samples; for one sample, provide two colors to distinguish adjacent chromosomes.
-o | --output | -output: Full path of the pdf file for the Manhattan plot.
-h | --help | -help: Print help information.

xflicsu/RADAR

RADAR

Features

Schema

Installation requirements

Installation

Configuration

Reference and sequence index

Variants annotation: dbSNP, 1000Genome, EVS

Genome annotation

Usage

PART 1: Read mapping and RNA-editing calling

Options

PART 2: RNA-editing result summarization and visualization

1. RNA-editing result summarization

Options

2. Histogram plot for each treatment

Options

3. Manhattan plot of specific RNA-editing type

Options

License