Clinical-Genomics-Lund/nextflow_wgs

Change default handling of undefined annotations when selecting `most_severe_consequence` in `modify_scout_vcf.pl`

alkc opened this issue · 0 comments

alkc commented

TLDR

Missing annotations as added by new version of VEP are sorted out as most severe by perl.

The default behavior should be to rank them lowest by default or the script should crash

Background

Consequences are ranked here:

my %rank = (
'transcript_ablation' => 1,
'initiator_codon_variant' => 2,
'frameshift_variant' => 3,
'stop_gained' => 4,
'start_lost' => 5,
'stop_lost' => 6,
'splice_acceptor_variant' => 7,
'splice_donor_variant' => 8,
'inframe_deletion' => 9,
'transcript_amplification' => 10,
'splice_region_variant' => 11,
'missense_variant' => 12,
'protein_altering_variant' => 13,
'inframe_insertion' => 14,
'incomplete_terminal_codon_variant' => 15,
'non_coding_transcript_exon_variant' => 16,
'synonymous_variant' => 17,
'mature_mirna_variant' => 18,
'non_coding_transcript_variant' => 19,
'regulatory_region_variant' => 20,
'upstream_gene_variant' => 21,
'regulatory_region_amplification' => 22,
'tfbs_amplification' => 23,
'5_prime_utr_variant' => 24,
'intron_variant' => 25,
'3_prime_utr_variant' => 26,
'feature_truncation' => 27,
'tf_binding_site_variant' => 28,
'start_retained_variant' => 29,
'stop_retained_variant' => 30,
'feature_elongation' => 31,
'regulatory_region_ablation' => 32,
'tfbs_ablation' => 33,
'coding_sequence_variant' => 34,
'downstream_gene_variant' => 35,
'nmd_transcript_variant' => 36,
'intergenic_variant' => 37
);

and retrieved/sorted here:

## MOST SEVERE CONEQUENCE
my $csq_ref = $doobi->{INFO}->{CSQ};
my $m_s_c = CSQ($csq_ref);
my $most_severe = ".";
if (@$m_s_c) {
$_ = lc for @$m_s_c;
$most_severe = (sort { $rank{$a} <=> $rank{$b} } @$m_s_c)[0];
}
push @add_info_field, "most_severe_consequence=".$most_severe;

The value returned by the hash whe naccessing an undefined key is treated as a 0 in the sorting above, meaning that undefined annotations will always be ranked ahead of all other defined annotations, which all have values > 0

The script should crash to alert the dev that the new/missing/wrong annotations need to be handled.

Furthermore, the ranking should be moved out of the script into some config space.