This implementation of the HRDetect pipeline is intended for RESEARCH USE ONLY and NOT FOR THE PURPOSE OF INFORMING CLINICAL DECISION-MAKING.
This repository contains an implementation of HRDetect used by Eric Y. Zhao at the Genome Sciences Centre.
HRDetect is run via a Snakemake pipeline. It depends upon a working installation of R, with numerous dependencies. These may be installed by running make dependencies
, which builds a miniconda environment with the necessary installations. To source that environment after it is installed, simply run source dependencies/miniconda3/bin/activate dependencies
prior to running the pipeline.
Depending on your system and needs, some of these dependencies may require tweaking. Notably, the dependency installer assumes a linux-based environment.
Also necessary are two in-house tools called SignIT
and hrdtools
, which can be acquired by running make
.
> make
if [ -d git/hrdtools ]; \
then(cd git/hrdtools && git pull); \
else git clone git@github.com:eyzhao/hrdtools.git git/hrdtools; \
fi
Cloning into 'git/hrdtools'...
remote: Counting objects: 97, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 97 (delta 1), reused 0 (delta 0), pack-reused 91
Receiving objects: 100% (97/97), 163.64 KiB | 821.00 KiB/s, done.
Resolving deltas: 100% (16/16), done.
if [ -d git/SignIT ]; \
then(cd git/SignIT && git pull); \
else git clone git@github.com:eyzhao/SignIT.git git/SignIT; \
fi
Cloning into 'git/SignIT'...
remote: Counting objects: 394, done.
remote: Compressing objects: 100% (113/113), done.
remote: Total 394 (delta 50), reused 58 (delta 22), pack-reused 259
Receiving objects: 100% (394/394), 1.11 MiB | 1.44 MiB/s, done.
Resolving deltas: 100% (153/153), done.
HRDetect expects data files of specific types and formats at specific locations. All files are housed at data/{project}/{subproject}/patients/{patient}/{sample}/
, where {patient}
and {sample}
together form a unique ID pair for a given sample. Within each such directory, there must be four files specific to the sample.
segments.tsv
somatic_indels.vcf
somatic_snvs.vcf
somatic_sv.tsv
If you would like to use the Snakemake pipeline as is, then you can provide a project-specific file under the projects
directory. Some projects files are already there as an example. You can then link to the project by adding a line include: "project/myproject.smk"
in Snakefile.
If you would like to construct your own pipeline structure, please feel free to use the scripts in the scripts
folder as needed.
For those interested in constructing their own working pipeline using Snakemake, there is an example project named example
within this repository to demonstrate how this could work.
- An example of raw input data can be found in
examples/example
. - This example data is copied and/or processed into the appropriate format, which is then stored in
data/example
. The Snakemake pipeline commands that pre-process the raw data are found inprojects/example.smk
. - The expected output after running the full pipeline can be found in
output/example
- The
Snakefile
is currently set up to run this example data. If you would like to run an integration test yourself, try deleting part or all ofoutput/example
anddata/example
and runningsnakemake -p
(adding-p
doesn't change how Snakemake runs, but will show you the commands being issued in each step).
Note that you may still run into errors/issues depending on your specific environment setup. As this pipeline has not yet been tested across a wide variety of environments, we are continuing to work out the kinks.
This is a file with segmented CNV/LOH calls with at least 5 columns.
chr
: The chromosome namestart
: Start position of CNV/LOH callend
: End position of CNV/LOH callcopy_number
: The tumour copy number of the segmentlohtype
: The type of LOH state. Should be amongst the following:ASCNA
: Allele-specific copy number amplificationBCNA
: Balanced copy number amplificationHET
: Heterozygous (normal)NLOH
: Neutral LOH (loss of heterozygosity, but 2 copies present)DLOH
: Deletion LOHALOH
: Amplification LOH
A VCF file containing indels which can be parsed by R using the readVCF()
function of VariantAnnotation.
A VCF file containing SNVs which can be parsed by R using the readVCF()
function of VariantAnnotation.
A tab-delimited file with structural variant data. Should contain the following columns:
chr1
: Name of the first chromosome involvedpos1
: Coordinate of the SV breakpoint corresponding to the first chromosomechr2
: Name of the second chromosome involvedpos2
: Coordinate of the SV breakpoint corresponding to the second chromosometype
: Can take on valuesDEL
,DUP
,TRA
, orINV
. Unless the value isTRA
, the two chromosomes should be the same.
If you use HRDtools or this implementation of HRDetect in your publication, please cite the following study: