ReliableGenome (RG) is a method for partitioning genomes into high and low concordance regions with respect to a set of surveyed VC pipelines. RG integrates variant call sets created by multiple pipelines from arbitrary numbers of input datasets and interpolates expected concordance for genomic regions without data resulting in a genome-wide concordance score. Ultimately, genomic regions of high/low concordance are calculated from this genome-wide signal.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Read more about RG in this paper or have a look at this poster that was presented at the NGS16 conference.
- A genomic partition calculated from 219 deep WGS alignments can be downloaded here(.tbi). The partition contains 2,209,778 concordance intervals located on chromosomes 1,..,22,X,Y.
- Reference sequence: hs37d5 (GRCh37 + decoy)
- Read mapper: bwa + stampy
- Variant callers: GATK HaplotypeCaller, platypus, samtools
- Params: wc = 1; wd = -3; tc=td=0.5; window size x=1000
- The same partition with LCR and HD regions removed (RG-LCR-HD(.tbi)), see paper.
- The same partition with LCR100 and HD regions removed (RG-LCR100-HD(.tbi)), see paper.
- The same partition with UM75 regions removed (RG-UM75(.tbi)), see paper.
- maven3
- jdk 1.7+
- Clone source code from github
- Run the bash script build.sh
- If everything worked you will find a standalone executable JAR in the bin directory and a library jar (containing only the classes from the RG source tree) in the target directory.
You can run RG via java -jar bin/wtchg-rg-1.0.jar
which will print basic usage information.
Use java -Xmx12g -jar ...
to run RG with more dedicated heap space (recommended).
To join VCF files from different variant callers, run:
java -jar bin/wtchg-rg-1.0.jar CalcreliabilitySignals join
to get detailed usage information.
Usage example:
java -Xmx12g -jar bin/wtchg-rg-1.0.jar join -d <GATK.VCF> -dl GATK -d <PLAT.VCF> -dl PLAT -d <SAMT.VCF> -dl SAMT -o <SNV.out.vcf> -oi <INDEL.out.vcf> -dontCheckSort -dropAllFiltered -indelMergeWin 5
To calculate the genome-wide concordance score signal, run:
java -jar bin/wtchg-rg-1.0.jar CalcreliabilitySignals calc
to get detailed usage information.
Usage example:
java -Xmx12g -jar bin/wtchg-rg-1.0.jar calc -o <OUTDIR> -w 1000 -scoringSchema 1,-3 -thresholds 0.5,0.5 -dontCheckSort -v
Find some test VCF files that are ready to JOIN in data/public/vcf/.
Usage example (please modify paths to vcf/jar files as required):
java -Xmx12g -jar wtchg-rg-1.0.jar calc -o results -w 1000 -scoringSchema 1,-3 -thresholds 0.5,0.5 -createWigs -dontCheckSort -v -d vcf/AW_CRS_1631.DP+MDI.vcf.gz -d vcf/AW_CRS_1632.DP+MDI.vcf.gz -d vcf/AW_CRS_1806.DP+MDI.vcf.gz -d vcf/AW_CRS_1807.DP+MDI.vcf.gz -d vcf/AW_CRS_4103.DP+MDI.vcf.gz -d vcf/AW_CRS_4917.DP+MDI.vcf.gz -d vcf/AW_SC_4654.DP+MDI.vcf.gz -d vcf/AW_SC_4655.DP+MDI.vcf.gz -d vcf/AW_SC_4659.DP+MDI.vcf.gz
Please note that the "-createWigs" switch results in the creation of WIG files containing the genome-wide (interpolated) score signal and a signal showing the number of contributing datasets per position ("power signal"). The produced WIG files are too large to load them into a genome browser directly and should be converted, e.g., to the BigWig format using the following commandline
wigToBigWig <WIG> <CHRSIZES> <BIGWIG>
.
(a chromosome sizes file is provided here for convenience).
Please cite our paper when using RG:
Niko Popitsch, WGS500 Consortium, Anna Schuh, and Jenny C. Taylor. ReliableGenome : Annotation of Genomic Regions with High/Low Variant Calling Concordance Bioinformatics, 2016 doi:10.1093/bioinformatics/btw587
If you want to get in touch, please write to <a href="mailto:niko@well.ox.ac.uk">niko@well.ox.ac.uk.