TelSeq is a software that estimates telomere length from whole genome sequencing data (BAMs).
The most current development version is available from our git repository: git://github.com/zd1/telseq.git
The software is implemented in C++.
Citation:
Estimating telomere length from whole genome sequence data
Zhihao Ding; Massimo Mangino; Abraham Aviv; Tim Spector; Richard Durbin. Nucleic Acids Research 2014; doi: 10.1093/nar/gku181 http://nar.oxfordjournals.org/content/42/9/e75
-
the bamtools library (https://github.com/pezmaster31/bamtools)
-
A modern version of GCC (version 4.8 or above) This can been seen by "gcc --version". If multiple GCCs are installed in your system, please set environmental variables pointing to the one of version 4.8 or above. e.g. in bash,
export CXX=/path/to/gcc/gcc-4.8.1/bin/g++
export CC=/path/to/gcc/gcc-4.8.1/bin/gcc
One easy way to install a new GCC is to use homebrew,
# install homebrew if you don't have it
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
# install GCC
brew install gcc
Go to the src directory and run autogen.sh from the src directory to generate the configure file
./autogen.sh
Then run
./configure
make
The executable binary will be at src/Telseq/telseq. If bamtools are installed not at the system location, you can specify their location by
./configure --with-bamtools=/path/to/bamtools
The /path/to/bamtools directory is the directory that contains 'lib' and 'include' sub directories.
telseq
telseq a.bam b.bam
bamlist should contain only 1 column with each row the path of a BAM. i.e.
/path/to/a.bam
/path/to/b.bam
telseq -f bamlist
cat bamlist | telseq
By default the result will be printed out to stdout. To change it to a file, use '-o' option to specify a file path. i.e.
telseq -o /path/to/output a.bam b.bam c.bam
This can also be achived by just direct the output to a file using '>', i.e.
telseq a.bam b.bam c.bam > /path/to/output
The software will print out running status to stderr as well. To separate them from stdout, one could direct log to a file, ie.
telseq a.bam b.bam c.bam 2>outputlog
Merge results from read groups by taking a weighted mean. However, it is benetifical run without
-m to output the result per lane, so to have an idea about inter-lane variation. The merging
can be done afterwards.
telseq -m a.bam > output
Column | Definitions |
---|---|
ReadGroup | read group, Defined by the RG tag in BAM header. |
Library | sequencing library that the read group belongs to. |
Sample | defined by the SM tag in BAM header. |
Total | total number of reads in this read group. |
Mapped | total number of mapped reads, SAM flag 0x4. |
Duplicates | total number of duplicate reads, SAM flag 0x400. |
LENGH_ESTIMATE | estimated telomere length. |
TEL0 | read counts for reads containing no TTAGGG/CCCTAA repeats. |
TEL1 | read counts for reads containing only 1 TTAGGG/CCCTAA repeats. |
TELn | read counts for reads containing only n TTAGGG/CCCTAA repeats. |
TEL16 | read counts for reads containing 16 TTAGGG/CCCTAA repeats. |
GC0 | read counts for reads with GC between 40%-42%. |
GC1 | read counts for reads with GC between 42%-44%. |
GCn | read counts for reads with GC between (40%+n*2%)-(42%+(n+1)*2%). |
GC9 | read counts for reads with GC between 58%-60%. |
By default for each BAM a header line will be printed out. This can be suppressed by using the '-H' option. It is useful when one has multiple BAMs to scan and wish the output to be merged together. i.e.
telseq -H a.bam b.bam c.bam > myresult
To just print out the header, use '-h' option. i.e.
telseq -h
Please refer to the official website for installing Docker https://docs.docker.com/engine/installation/
docker build -t telseq-docker github.com/zd1/telseq
Note that the sample path "/path/to/bam/sample.bam" in the machine that the container is run needs to be specified. "/sample.bam" doesn't need to be changed.
docker run -v /path/to/bam/sample.bam:/sample.bam telseq-docker /sample.bam
zhihao.ding at gmail.com