by : Wai Yi Leung, Tobias Marschall, Laurent Falquet, Yogesh Paudel, Hailiang Mei, Alex Schoenhuth and Tiffanie Yael Moss
This repository is used to store scripts written during the hackathon of ALLBio Testcase 2.
We aim at providing :
- a pipeline for automated Structural variation calling
- an automated approach for benchmarking (new) SV tools.
More information about the project can be found at the following websites:
ALLBio Bioinformatics, Testcase#2, Google site, members only!
Grab a copy of this repository from GitHub to your home folder and store this in allbiotc2:
cd ~
git clone https://github.com/ALLBio/allbiotc2.git
cd allbiotc2/
make install
The make install
command will do a system-wide install. This step requires sudo
rights.
Please take a closer look in the following repository where the installation scripts are located. These scripts were used to install the workshop-ready and production-ready virtual machine.
https://github.com/ALLBio/allbiovm
Comments are welcome via the ticketing system from Github.
If reference calls are provided in SDI
format, the following procedure can be followed to convert from SDI to VCF.
make -f ../scripts/Makefile \
REFERENCE_VCF=~/myworkdir/ref_all.complete.vcf \
SDI_FILE=~/myworkdir/ler_0.v7c.sdi \
preprocess
The software for the pipeline is placed into one central location in the following setup:
allbio@workbench:/virdir/Scratch/software$ tree -L 1
.
├── bowtie2-2.1.0
├── breakdancer
├── bwa-0.7.4
├── circos-0.63-4
├── clever-sv
├── delly_v0.0.9
├── dwac-seq0.7
├── FastQC
├── gasv
├── picard-tools-1.86
├── pindel
├── PRISM_1_1_6
├── samtools-0.1.19
├── sickle-master
└── SVDetect_r0.8b
Configuration can be done in the conf.mk and upon invocation of the pipeline by passing them via the commandline.
The most important and required variables are:
PROGRAMS
: Path to the directory where the programs are installedPYTHON_EXE
: Path to thePYTHON
executable, defaults topython
(system distributed version)REFERENCE_DIR
: Path to the referenceREFERENCE_VCF
: Full path to the VCF file with reference SV calls for benchmarkingFASTQ_EXTENSION
: Filename extentension of the FastQ filesPEA_MARK
: Filenaming of the left read of FastQ: sample-PEA_MARK
.FASTQ_EXTENSION
PEB_MARK
: Filenaming of the right read of FastQ: sample-PEB_MARK
.FASTQ_EXTENSION
*_THREADS
: Set the amount of cores to used by the programs.
Example invocation of the pipeline:
THREADS=8
make -f ../scripts/Makefile \
PROGRAMS=/virdir/Scratch/software\
REFERENCE_DIR=../input/reference_tair9 \
FASTQC_THREADS=$THREADS \
BWA_OPTION_THREADS=$THREADS \
PEA_MARK=.1 \
PEB_MARK=.2 \
FASTQ_EXTENSION=fastq \
REFERENCE_VCF=/virdir/Backup/reads_and_reference/vcf_reference/ref_all.complete.vcf
allbio@workbench:/opt/allbio/runs/synthetic_run$ tree -L 1
.
├── input
│ ├── reference_tair10
│ │ ├── bowtie2
│ │ ├── bwa
│ │ ├── reference.fa
│ │ └── reference.fa.fai
│ ├── sim-reads_1.fastq
│ ├── sim-reads_2.fastq
│ ├── sim-reads.409_10.1.fastq
│ ├── sim-reads.409_10.2.fastq
│ ├── sim-reads.511_10.1.fastq
│ ├── sim-reads.511_10.2.fastq
├── log
├── run_integrationtest
│ ├── bd.cfg
│ ├── comparison.tex
│ ├── run.sh
│ ├── sim-read-511_10.1.fastq -> ../input/sim-reads.511_10.1.fastq
│ ├── sim-read-511_10.1.filtersync.stats
│ ├── sim-read-511_10.1.singles.fastq
│ ├── sim-read-511_10.1.trimmed.fastq
│ ├── sim-read-511_10.2.fastq -> ../input/sim-reads.511_10.2.fastq
│ ├── sim-read-511_10.2.trimmed.fastq
│ ├── sim-read-511_10.bam
│ ├── sim-read-511_10.bam.bai
│ ├── sim-read-511_10.bd.vcf
│ ├── sim-read-511_10.breakdancer
│ ├── sim-read-511_10.delly
│ ├── sim-read-511_10.delly.vcf
│ ├── sim-read-511_10.flagstat
│ ├── sim-read-511_10.gasv
│ ├── sim-read-511_10.gasv.vcf
│ ├── sim-read-511_10.pindel
│ ├── sim-read-511_10.pindel.vcf
│ ├── sim-read-511_10.prism
│ ├── sim-read-511_10.prism.vcf
│ ├── sim-read-511_10.raw_fastqc
│ ├── sim-read-511_10.sam
│ ├── sim-read-511_10.trimmed_fastqc
│ └── sim-read-511_10.unsort.bam
└── scripts
└── Makefile -> ~/allbiotc2/Makefile