Clinical-Genomics/genmod

genmod score, data_type=string appears to match when the rule is a substring of the string in the VCF

Opened this issue · 1 comments

If the VCF has "CLNSIG=Likely_pathogenic;", it will incorrectly match the Pathogenic-rule (which has higher priority than the Likely pathogenic-rule, since "Pathogenic" is a substring of "Likely_pathogenic" (case insensitive).

This appears to be the code doing the matching: https://github.com/moonso/extract_vcf/blob/master/extract_vcf/plugin.py#L342

Minimal example files:
minimal_rankmodel_string.ini.txt
minimal_string.vcf.txt

$ genmod score -i test -c minimal_rankmodel_string.ini.txt -r minimal_string.vcf.txt -o minimal_string.score

Gives me RankResult=5 (for Pathogenic), rather than the expected 2 (for Likely_pathogenic)!

dnil commented

If this replicates it is most definitely a bug, and if nothing else the documentation should be updated. I believe a workaround was found rather early on, where scout loads all clinvar variants, but can't quite remember the fix for the scoring part. Will check.