RADAR (RNA-editing Analysis-pipeline to Decode All twelve-types of RNA-editing events) is devised to detect and visualize all possible twelve-types of RNA editing events from RNA-seq datasets.
RADAR can be conveniently applied to identify RNA-editing from RNA-seq data with stringent filtering steps.
- All possible RNA-editing events from each given RNA-seq dataset are summarized into an Excel file.
- Numbers of all twelve-types of RNA editing events are plotted by histograms according to their genomic locations in Alu, repetitive non-Alu and non-repetitive regions.
- Manhattan plots are further used to illustrate RNA editing ratios of selected types of RNA-editing events, such as C-to-U or A-to-G.
RADAR can be run directly without setup process after downloaded and unzipped, only if tools it depends on have been installed:
- HISAT2 (>=2.0.5)
- BWA (>=0.7.9)
- Samtools (>=1.7)
- GATK (>=4.0.1.0)
- Perl (>=5.18)
- Python (>=2.7.8)
Dependent Python modules: - R (>=3.0.2)
Dependent R packages:
As long as all Installation requirements have been fulfilled, RADAR can be run directly without setup process after downloaded and unzipped.
git clone https://github.com/YangLab/RADAR
cd RADAR
chmod +x RADAR
./RADAR -h
Reference genome, genomic sequence index and genomic annotations should be provided to RADAR within the RADAR.conf file. Examples of how to get these annotations have been provided in src/tools_preparation_of_RADAR_conf_annotation.sh.
- Genome build version of the reference genome.
- Example:
genome_build_version=hg38
- Example:
- Ribosomal DNA (rDNA) sequence index for BWA MEM, which can be created by command "bwa index ~/reference/Human/RNA_45S5/RNA45S5.fa"
- Example:
rDNA_idnex_bwa_mem=~/reference/Human/RNA_45S5/RNA45S5.fa
- Example:
- Path to reference genome
- Example:
genome_fasta=~/reference/Human/hg38/hg38_all.fa
- Example:
- Reference genome sequence index for HISAT2, which can be created by command "hisat2-build ~/reference/Human/hg38/hg38_all.fa ~/reference/Human/hg38/hg38_all.fa"
- Example:
genome_index_hisat2=~/reference/Human/hg38/hg38_all.fa
- Example:
- Reference genome sequence index for BWA MEM, which can be created by command "bwa index ~/reference/Human/hg38/hg38_all.fa"
- Example:
genome_index_bwa_mem=~/reference/Human/hg38/hg38_all.fa
- Example:
- Reference genome sequence index for Blat, which can be created by command "RADAR/tools/faToTwoBit ~/reference/Human/hg38/hg38_all.fa ~/reference/Human/hg38/hg38_all.fa.2bit"
- Example:
genome_index_blat=~/reference/Human/hg38/hg38_all.fa.2bit
- Example:
- Reference genome sequence index for GATK in the same directory with reference genome, which can be created by command "gatk CreateSequenceDictionary -R ~/reference/Human/hg38/hg38_all.fa"
- Example:
genome_index_gatk=~/reference/Human/hg38/hg38_all.dict
- Example:
- Total variants annotation from NCBI dbSNP and GATK index in the same directory.
- Example of the total .vcf file:
dbSNP_all=~/annotation/Human/hg38/SNP/dbSNP_b151/NCBI_dbSNP_b151_all_hg38.vcf
- Example of the GATK index for total .vcf (which can be created by command "gatk IndexFeatureFile -F ~/annotation/Human/hg38/SNP/dbSNP_b151/NCBI_dbSNP_b151_all_hg38.vcf"):
dbSNP_all_index_gatk=~/annotation/Human/hg38/SNP/dbSNP_b151/NCBI_dbSNP_b151_all_hg38.vcf.idx
- Example of the total .vcf file:
- SNP annotation from NCBI dbSNP divided by chromosome
- Example of the folder for NCBI dbSNP divided by chromosome:
SNP_dbSNP_divided_by_chromosome=~/annotation/Human/hg38/SNP/dbSNP_b151/split_chr
- Example of the folder for NCBI dbSNP divided by chromosome:
- SNP annotation from The 1000 Genomes Project divided by chromosome
- Example of the folder for SNP divided by chromosome:
SNP_1000Genome_divided_by_chromosome=~/annotation/Human/hg38/SNP/1000genomes/split_chr
- Example of the folder for SNP divided by chromosome:
- SNP annotation from The University of Washington Exome Sequencing Project divided by chromosome
- Example of the folder for SNP divided by chromosome:
SNP_EVS_divided_by_chromosome=~/annotation/Human/hg38/SNP/EVS/split_chr
- Example of the folder for SNP divided by chromosome:
- Annotation of Alu, repetitive non-Alu, all repetitive genomic region in the BED format
- Example of Alu annotation:
annotation_Alu=~/annotation/Human/hg38/Alu.bed
- Example of repetitive non-Alu annotation:
annotation_Repetitive_non_Alu=~/annotation/Human/hg38/Repetitive_non-Alu.bed
- Example of all repetitive annotation:
annotation_All_repetitive=~/annotation/Human/hg38/All_repetitive.bed
- Example of Alu annotation:
- Annotation of RepeatMasker simple repeats from UCSC in BED format
- Example:
annotation_simple_repeats=~/annotation/Human/hg38/UCSC_RepeatMask_SimpleRepeats_hg38.bed
- Example:
- Annotation of splice sites in BED format, which is used as the input of option "--known-splicesite-infile" during HISAT2 mapping. Official website of HISAT2 has detailed how to create it.
- Example:
annotation_splice_sites=~/annotation/Human/hg38/ref_all_spsites_hg38.bed
- Example:
- Annotation of intronic 4 bp flanking splice sites
- Example:
annotation_intronic_4site=~/annotation/Human/hg38/hg38_intronic_4site.bed
- Example:
- Annotation of transcribed strands of genes
- Example:
annotation_gene_transcribed_strands=~/annotation/Human/hg38/ref_UCSC_refFlat.bed
- Example:
All genome annotations are in BED format:
Field | Description |
---|---|
chrom | Chromosome |
start | Start position |
end | End position |
name | Repeat name or gene symbol/gene name |
score | Smith Waterman alignment score for repeat region |
strand | + or - for strand |
RADAR pipeline can break down into three main steps, while read mapping and RNA-editing calling are integrated into one part during operation:
- For paired-end RNA-seq data:
COMMAND:./RADAR read_mapping_and_RNA_editing_calling -1 "full_path_of_fastq1" -2 "full_path_of_fastq2" --stranded "true/false" -n "outname" -o "output_dir" -t "maximum_threads"
- For single-end RNA-seq data:
COMMAND:./RADAR read_mapping_and_RNA_editing_calling -s "full_path_of_fastq" --stranded "true/false" -n "outname" -o "output_dir" -t "maximum_threads"
-s | --single | -single
: Fasta file for the single-end RNA-seq data.
-1 | --fq1 | -fq1
and -2 | --fq2 | -fq2
: Fasta file for the paired-end RNA-seq data.
--rna-strandness | -rna-strandness
: The strand-specific information used for HISAT2 mapping. For single-end reads, use F or R. For paired-end reads, use either FR or RF. Detailed descriptions of this option was available in HISAT2 manual.
-n | --outname | -outname
: The prefix of file name for the RNA-editing results. Three result files will be created under the directory of outdir, including "outname_Alu.vcf", "outname_Repetitive_non_Alu.vcf", "outname_Non_repetitive.vcf".
-o | --outdir | -outdir
: Output directory of the results.
-t | --thread | -thread
: Maximum threads used for computation.
-h | --help | -help
: Print help information.
COMMAND: RADAR RNA_editing_summarization -i "outdir_of_PART1" -o "file_of_result"
-i | --inputdir | -inputdir
: The directory of the PART1 read mapping and RNA-editing results.
-o | --output | -output
: Full path of the output file for the Excel.
-h | --help | -help
: Print help information.
COMMAND: ./RADAR histogram -i "outdir_of_PART1" -n "outnames_of_replicates_from_PART1" -o "file_of_plot"
-i | --inputdir | -inputdir
: The directory of the PART1 read mapping and RNA-editing results.
-n | --outname_of_replicates | -outname_of_replicates
: The outnames of PART1 RNA-editing results for multiple replicates from the same treatment. The separator between outnames should be comma, for example, "s1_rep1,s1_rep2,s1_rep3".
-o | --output | -output
: Full path of the pdf file for the histogram.
-h | --help | -help
: Print help information.
COMMAND: ./RADAR Manhattan_plot -i "outdir_of_PART1" --RNA_editing_type "RNA_editing_type" -n "outname_of_samples_from_PART1" -c "colors_of_samples_in_the_plot" -o "file_of_plot"
-i | --inputdir | -inputdir
: The directory of the PART1 read mapping and RNA-editing results.
--RNA_editing_type | -RNA_editing_type
: Interested RNA-editing type for the Manhattan plot, which was selected from all twelve-types RNA-editing, including A-to-C, A-to-G, A-to-U, C-to-A, C-to-G, C-to-U, G-to-A, G-to-C, G-to-U, U-to-A, U-to-C, U-to-G.
-n | --outname_of_samples | -outname_of_samples
: Outnames of samples from PART1 RNA-editing results. The separator between outnames should be comma, for example, "s1_rep1,s1_rep2,s1_rep3,s2_rep1,s2_rep2,s2_rep3".
-c | --color_of_samples | -color_of_samples
: Color of hex RGB format for the dot of samples in the plot. Colors should be within double quotations and seperated by comma (,). For example, "#919191,#919191,#FF3F00,#FF3F00,#FF3F00". For multiple samples, provide matched colors and samples; for one sample, provide two colors to distinguish adjacent chromosomes.
-o | --output | -output
: Full path of the pdf file for the Manhattan plot.
-h | --help | -help
: Print help information.
Copyright (C) 2020 YangLab. Licensed GPLv3 for open source use or contact YangLab (yanglab@@picb.ac.cn) for commercial use.