/vcf_accuracy

vcf accuracy evaluator using VT, BEDTOOLS, PyVCF, and TABIX

Primary LanguagePython

VCFs are Great

Install some Libs

Expected formatting

MAF input requirements

  • Requires standard TCGA MAF columns (https://wiki.nci.nih.gov/display/TCGA/Mutation+Annotation+Format+(MAF)+Specification):

    • Hugo_Symbol, Entrez_Gene_ID, Center, NCBI_Build, Chromosome, Start_Position, End_Position, Strand, Variant_Classification, Variant_Type, Reference_Allele, Tumor_Seq_Allele1, Tumor_Seq_Allele2, dbSNP_RS, dbSNP_Val_Status, Tumor_Sample_Barcode, Matched_Norm_Sample_Barcode, Match_Norm_Seq_Allele1, Match_Norm_Seq_Allele2, Tumor_Validation_Allele1, Tumor_Validation_Allele2, Match_Norm_Validation_Allele1, Match_Norm_Validation_Allele2, Verification_Status, Validation_Status, Mutation_Status, Sequencing_Phase, Sequence_Source, Validation_Method, Score, BAM_File, Sequencer
  • Requires MSKCC specific columns:

    • n_ref_count, n_alt_count, n_depth, t_alt_count, t_ref_count, t_depth
  • If subsetting the bedfile chromosome labelling must match the variant file style chr1 vs. 1, MT vs. M etc.

  • Pre-normalized datasets must be left aligned and processed using VT (http://genome.sph.umich.edu/wiki/Vt).