- Introduction
- Download
- Change log
- Version 1.07
- Version 1.06
- Version 1.05
- Version 1.04
- Version 1.03
- Version 1.02
- Version 1.01
- Version 1.00
- Release candidate (RC) version 1.2
- Release candidate (RC) version 1.1
- Release candidate (RC) version 1.0
- Prerequisites
- Install required perl packages
- Install required software
- Download required database
- Usage
- [Parameters] (#pa)
- Input file
- Gene file
- Config file
- Example
- Download example data
- Example usage
- Results
Exome sequencing is one of the most cost efficient sequencing approaches for conducting genome research on coding regions. The primary applications of exome sequencing include detection of single nucleotide polymorphisms, somatic mutations, small indels, and copy number variations. There are also some less obvious data mining opportunities through exome sequencing data such as extraction of mitochondria, and virus. Another less explored genomic aberration that can be detected through exome sequencing is internal exon deletions (IEDs). Exon deletion is the deletion of one or more consecutive exons in a gene.
IEDs have biological importance in cancer and may remove important regulatory mechanisms or protein-protein interactions. Given the large amount of publicly available exome sequencing data accumulated over the last few years, a method that can efficiently detect such deletions would benefit the medical research community greatly and provide a means to rapidly find new internal deletion candidates. Thus, we designed ExonDel, a tool aimed at detecting IEDs through exome sequencing data.
ExonDel is written with Perl and R and is freely available for public use. It can be downloaded from ExonDel website on github.
# Download #You can directly download ExonDel from github by the following commands (If git has already been installed in your computer).
#The source codes of ExonDel software will be downloaded to your current directory
git clone https://github.com/slzhao/ExonDel.git
Or you could also download the zip file of ExonDel from github.
#The zip file of ExonDel software will be downloaded to your current directory
wget https://github.com/slzhao/ExonDel/archive/master.zip -O exonDel.zip
#A directory named ExonDel-master will be generated and the source codes will be extracted there
unzip exonDel.zip
Perl is a highly capable, widely used, feature-rich programming language. It could be downloaded Perl website.
If Perl has already been installed on your computer, no other Perl module is needed to run ExonDel in most cases. And you can run the following commands to make sure all the required modules have been installed.
#go the the directory where your ExonDel software is.
#And test whether all the required modules have been installed.
bash test.modules
The successful output would look like this
ok File::Basename
ok File::Copy
ok FindBin
ok Getopt::Long
ok HTML::Template
ok Report::Generate
ok threads
ok threads::shared
Otherwise, for example, if File::Basename package was missing, it may look like this
fail File::Basename
ok File::Copy
ok FindBin
ok Getopt::Long
ok HTML::Template
ok Report::Generate
ok threads
ok threads::shared
Then you need to install the missing packages from CPAN. A program was also provided to make the package installation more convenient.
#if File::Basename was missing
bash install.modules File::Basename
R is a free software environment for statistical computing and graphics. It could be downloaded from R website.
After you install R and add R bin file to your Path, the software can find and use R automatically. Or you can modify the ExonDel.cfg file in the software directory and tell the program where the R is on your computer. Here is the line you need to modify.
#where the R bin file is
RBin=R
SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format. It could be downloaded from SAM Tools website.
After you install SAMtools and add SAMtools bin file to your Path, the software can find and use SAMtools automatically. Or you can modify the ExonDel.cfg file in the software directory and tell the program where the SAMtools is on your computer. Here is the line you need to modify.
#where the SAMtools bin file is
samtoolsBin=samtools
#bed file:
#Column1 Column2 Column3
Chromosome StartPosition EndPosition
#bed file and if you want select some genes:
#Column1 Column2 Column3 Column4
Chromosome StartPosition EndPosition Gene
#gtf file (with header):
#bin name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds score name2 cdsStartStat cdsEndStat exonFrames
23 NM_005509 chr5 + 118407083 118584822 118407264 118582914 3 118407083,118433673,1184376292, 118407351,118433799,118437701, 0 DMXL1 cmpl cmpl 0,0,0,
The .fa files could be downloaded at UCSC website. As UCSC does not provide the full .fa file, you may needed to download chromFa.tar.gz and combine the files in different chromosome into one .fa file, or download hg19.2bit and convert the 2bit file to .fa file by twoBitToFa.
After the .gtf (refseq) file, .bed file and .fa file were downloaded, you also need to modify the config file so that ExonDel can find them. See config file for more information.
# Usage #The usage of ExonDel software could be:
perl ExonDel.pl -i bamfileList -o outputDirectory [-g geneList] [-c configFile] [-t threads]
-i input bam filelist Required. Input file. It should be a file listing all analyzed bam files and their paths.
-o output directory Required. Output directory for ExonDel result. If the directory doesn't exist, it would be created.
-g selected gene list Optional. Genes interested. If specified, Only these genes will be analyzed by ExonDel.
-c config file Optional. If not specified, ExonDel.cfg in ExonDel directory will be used.
-t threads Optional. Threads used in analysis. The default value is 4. This parameter only valid for analysis of bam files.
-ra re-analysis Optional. If specified, the analysis will be performed again.
-h help Optional. Show help information.
#An example of input file. The labels are optional.
#Column1 (bam files) Column2(labels)
sample1File.bam labelForSample1
Sample2File.bam labelForSample2
The user need to modify the following lines in ExonDel.cfg to ensure ExonDel could find the .gtf, .bed, and .fa files.
#reference .bed file
bedfile=
#reference .gtf file
refseq=
#reference .fa file
reffa=
if these files were example.bed, example.gtf, hg19.fa and located in /reference/, then the user need to modify these lines into:
#reference .bed file
bedfile=/reference/example.bed
#reference .gtf file
refseq=/reference/example.gtf
#reference .fa file
reffa=/reference/hg19.fa
The user need to take care of the following lines in ExonDel.cfg to make sure they can get a reliable result.
#Minimal percent of covered base pairs for each exon
exon_bp_cover_threshold=0.1
#Minimal percent of covered exons for each gene. 1 means 100%
overall_exon_count_threshold=1
exon_bp_cover_threshold=0.1 means a exon will be considered in exon deletion detection only when it was covered by at least 10% base pairs in the bed file; overall_exon_count_threshold=1 means a gene will be considered in exon deletion detection only when 100% of its exons passed the above exon_bp_cover_threshold; If these parameters were set too high, then no gene can pass and may cause error.
# Example #The example files can be downloaded from ExonDel website on sourceforge.
You need to download and extract it to a directory. Then the example code for running ExonDel with given example data set could be:
## download example data ## #download and extract example data into exampleDir mkdir exampleDir cd exampleDir wget http://sourceforge.net/projects/exondel/files/example.tar.gz/download tar zxvf example.tar.gz ls ## example usage ##We can use all genes to do exon deletions detection.
#assume ExonDel.pl in the directory ExonDel-master/, examples in exampleDir
cd exampleDir
perl path_to/ExonDel-master/ExonDel.pl -i exampleBams.list -c ExonDel.example.cfg -o ./result1
You should standard out message as below:
#[Fri Dec 20 10:28:00 2013] All genes will be used
#[Fri Dec 20 10:28:00 2013] Loading BED file
#[Fri Dec 20 10:28:00 2013] Finish BED file (cover 18683 base pairs)
#[Fri Dec 20 10:28:00 2013] Loading RefSeq file
#[Fri Dec 20 10:28:00 2013] Finish RefSeq file
#[Fri Dec 20 10:28:00 2013] Loading fasta file and caculating GC content
#[Fri Dec 20 10:28:02 2013] Caculating GC content in 5: 43 exons
#[Fri Dec 20 10:28:04 2013] Caculating GC content in 6: 24 exons
#[Fri Dec 20 10:28:05 2013] Caculating GC content in 13: 35 exons
#[Fri Dec 20 10:28:05 2013] Finish fasta file
#[Fri Dec 20 10:28:05 2013] Loading genesPassQCwithGC.bed
#[Fri Dec 20 10:28:05 2013] Processing bam files
#[Fri Dec 20 10:28:06 2013] Thread 1 stared
#[Fri Dec 20 10:28:06 2013] Thread 1 processing example1.bam example1
#[Fri Dec 20 10:28:06 2013] Thread 1 processing example2.bam example2
#[Fri Dec 20 10:28:06 2013] Thread 1 processing example3.bam example3
#[Fri Dec 20 10:28:06 2013] Thread 2 stared
#[Fri Dec 20 10:28:06 2013] Thread 2 processing example4.bam example4
#[Fri Dec 20 10:28:06 2013] Thread 3 stared
#[Fri Dec 20 10:28:06 2013] Thread 4 stared
#[Fri Dec 20 10:28:06 2013] Thread 1 finished
#[Fri Dec 20 10:28:07 2013] Thread 2 finished
#[Fri Dec 20 10:28:07 2013] Thread 3 finished
#[Fri Dec 20 10:28:07 2013] Thread 4 finished
#[Fri Dec 20 10:28:07 2013] Finish bam file
#[Fri Dec 20 10:28:07 2013] Analyzing Exon Deletion
#[Fri Dec 20 10:28:07 2013] Success!
Also we can just select some genes to do exon deletions detection.
perl path_to/ExonDel-master/ExonDel.pl -i exampleBams.list -c ExonDel.example.cfg -g genelist.txt -o ./result2
You will see the standard out message as below:
#[Fri Dec 20 10:34:34 2013] Only the genes in genelist.txt will be used
#[Fri Dec 20 10:34:34 2013] GC adjustment will not be performed, and the constant cutoffs in config file will be used
#[Fri Dec 20 10:34:34 2013] Loading BED file
#[Fri Dec 20 10:34:34 2013] Finish BED file (cover 6655 base pairs)
#[Fri Dec 20 10:34:34 2013] Loading RefSeq file
#[Fri Dec 20 10:34:34 2013] Finish RefSeq file
#[Fri Dec 20 10:34:34 2013] Loading fasta file and caculating GC content
#[Fri Dec 20 10:34:38 2013] Caculating GC content in 6: 24 exons
#[Fri Dec 20 10:34:39 2013] Caculating GC content in 13: 35 exons
#[Fri Dec 20 10:34:39 2013] Finish fasta file
#[Fri Dec 20 10:34:39 2013] Loading genesPassQCwithGC.bed
#[Fri Dec 20 10:34:39 2013] Processing bam files
#[Fri Dec 20 10:34:40 2013] Thread 1 stared
#[Fri Dec 20 10:34:40 2013] Thread 1 processing example1.bam example1
#[Fri Dec 20 10:34:40 2013] Thread 1 processing example2.bam example2
#[Fri Dec 20 10:34:40 2013] Thread 1 processing example3.bam example3
#[Fri Dec 20 10:34:40 2013] Thread 2 stared
#[Fri Dec 20 10:34:40 2013] Thread 2 processing example4.bam example4
#[Fri Dec 20 10:34:40 2013] Thread 3 stared
#[Fri Dec 20 10:34:40 2013] Thread 4 stared
#[Fri Dec 20 10:34:40 2013] Thread 1 finished
#[Fri Dec 20 10:34:40 2013] Thread 2 finished
#[Fri Dec 20 10:34:40 2013] Thread 3 finished
#[Fri Dec 20 10:34:40 2013] Thread 4 finished
#[Fri Dec 20 10:34:40 2013] Finish bam file
#[Fri Dec 20 10:34:40 2013] Analyzing Exon Deletion
#[Fri Dec 20 10:34:41 2013] Success!
exonDelsBy1.csv to exonDelsBy9.csv included the deletions found by a moving-window with 1 to 9 exons.
exonDelsCutoffs.csv included the cutoffs for every bam file.
figures directory included some figures as examples for exon deletions found by different moving-windows.