Software for calling small variants (or validating existing calls) in a small genome
javac src/*.java
This software uses the output of samtools mpileup to call variants based on simple allele frequency thresholds.
Usage: java -cp src CallVariants [args]
Example: java -cp src CallVariants pileup_file=jhu004.mpileup out_file=calls.vcf
Required args:
pileup_file (String) - the output of samtools mpileup with the read alignments and reference
out_file (String) - the output file to call SNPs
Optional args:
coverage_threshold (int) [20] - the min coverage needed to possibly flag a region
genome_max_len (int) [31000] - an upper bound on the genome length
alt_threshold (float) [0.15] - call a variant if any alt allele frequency > this value
ref_threshold (float) [0.60] - call an N even if no alt allele frequency is high enough if ref allele frequency < this value)
This software merges VCFs from multiple sources.
Usage: java -cp src MergeVariants [args]
Example: java -cp src MergeVariants filelist=vcflist.txt out_file=merged.vcf
Required args:
file_list (String) - a txt file containing absolute paths to VCF files, one on each line
out_file (String) - file to write merged variants to
This software takes a merged VCF and combines variants in adjacent positions.
Usage: java -cp src CombineVariants [args]
Example: java -cp src CombineVariants vcf_file=merged.vcf out_file=merged_combined.vcf
Required args:
vcf_file (String) - vcf file containing the variants after merging across samples
out_file (String) - file to output variants after combining adjact positions
This software takes a post-filtered TSV and creates 2 VCF files - one with all variants and one with consensus variants
Usage: java -cp src TableToVcf [args]
Example: java -cp src TableToVcf table_file=merged.vcf consensus_file=consensus.vcf all_file=all.vcf
Required args:
table_file (String) - vcf file containing the variants after merging across samples
consensus_file (String) - file to output consensus variants to
all_file (String) - file to output all variants to
Inputs are sample.fa
, sample.vcf
from some variant caller, and sample.bam
.
The variant calling/merging pipeline has been implemented in run.sh which takes the following parameters:
./run.sh <reference> <bam alignments> <vcf variant calls, different files separated by commas> <output prefix>