openvar/variantValidator

Unsupported transcripts

Closed this issue · 5 comments

Describe the bug
An anonymous user has been trying to validate the NM_000109.2:r.4540G>C but this triggers an ERROR message to the sysadmins.

On screen, the error message is Unable to validate the submitted variant NM_000109.2:r.4540G>C against the GRCh37 assembly.

The error is triggered because transcript NM_000109.2 is not in the database, whereas versions 3 and 4 are included. If the description is corrected to NM_000109.4:r.4540g>c then it validates.

VV ought to be trapping that the reference sequence is not supported and outputting a user-friendly warning.

NM_000109.2:r.4540G>C triggers

"validation_warnings": [
"This not a valid HGVS description, due to characters being in the wrong case. Please check the use of upper- and lowercase characters.",
"RNA sequence must be lower-case"
],

Or will do when I update. Will test g>c now

On the dev version I get this for GRCh38

import json
import VariantValidator
vval = VariantValidator.Validator()
variant = "NM_000109.2:r.4540g>c"
genome_build = 'GRCh38'
select_transcripts = 'all'
transcript_set = 'refseq'
validate = vval.validate(variant, genome_build, select_transcripts, transcript_set)
validation = validate.format_as_dict(with_meta=True)
print(json.dumps(validation, sort_keys=True, indent=4, separators=(',', ': ')))
{
    "NM_000109.2:c.4540G>C": {
        "alt_genomic_loci": [],
        "annotations": {
            "chromosome": "X",
            "db_xref": {
                "CCDS": null,
                "HPRD": "02303",
                "ensemblgene": null,
                "hgnc": "HGNC:2928",
                "ncbigene": "1756",
                "select": false
            },
            "ensembl_select": false,
            "mane_plus_clinical": false,
            "mane_select": false,
            "map": "Xp21.2-p21.1",
            "note": "dystrophin",
            "refseq_select": false,
            "variant": "DP427C"
        },
        "gene_ids": {
            "ccds_ids": [
                "CCDS14234",
                "CCDS48091",
                "CCDS14230",
                "CCDS14233",
                "CCDS14229",
                "CCDS94585",
                "CCDS94586",
                "CCDS14232",
                "CCDS55394",
                "CCDS55395",
                "CCDS14231"
            ],
            "ensembl_gene_id": "ENSG00000198947",
            "entrez_gene_id": "1756",
            "hgnc_id": "HGNC:2928",
            "omim_id": [
                "300377"
            ],
            "ucsc_id": "uc004dda.2"
        },
        "gene_symbol": "DMD",
        "genome_context_intronic_sequence": "",
        "hgvs_lrg_transcript_variant": "",
        "hgvs_lrg_variant": "",
        "hgvs_predicted_protein_consequence": {
            "lrg_slr": "",
            "lrg_tlr": "",
            "slr": "NP_000100.2:p.(V1514L)",
            "tlr": "NP_000100.2:p.(Val1514Leu)"
        },
        "hgvs_refseqgene_variant": "",
        "hgvs_transcript_variant": "NM_000109.2:c.4540G>C",
        "primary_assembly_loci": {
            "grch37": {
                "hgvs_genomic_description": "NC_000023.10:g.32404537C>G",
                "vcf": {
                    "alt": "G",
                    "chr": "X",
                    "pos": "32404537",
                    "ref": "C"
                }
            },
            "grch38": {
                "hgvs_genomic_description": "NC_000023.11:g.32386420C>G",
                "vcf": {
                    "alt": "G",
                    "chr": "X",
                    "pos": "32386420",
                    "ref": "C"
                }
            },
            "hg19": {
                "hgvs_genomic_description": "NC_000023.10:g.32404537C>G",
                "vcf": {
                    "alt": "G",
                    "chr": "chrX",
                    "pos": "32404537",
                    "ref": "C"
                }
            },
            "hg38": {
                "hgvs_genomic_description": "NC_000023.11:g.32386420C>G",
                "vcf": {
                    "alt": "G",
                    "chr": "chrX",
                    "pos": "32386420",
                    "ref": "C"
                }
            }
        },
        "reference_sequence_records": {
            "protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_000100.2",
            "transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_000109.2"
        },
        "refseqgene_context_intronic_sequence": "",
        "rna_variant_descriptions": {
            "rna_variant": "NM_000109.2:r.4540g>c",
            "translation": "NP_000100.2:p.Val1514Leu",
            "translation_slr": "NP_000100.2:p.V1514L",
            "usage_warnings": [
                "RNA (r.) descriptions are independent of cDNA descriptions (c.)",
                "RNA descriptions must only be used if the RNA has been sequenced and must not be inferred from a cDNA description",
                "c. and g. descriptions provided by VariantValidator must only be used if the DNA sequence has been confirmed"
            ]
        },
        "selected_assembly": "GRCh38",
        "submitted_variant": "NM_000109.2:r.4540g>c",
        "transcript_description": "Homo sapiens dystrophin (DMD), transcript variant Dp427c, mRNA",
        "validation_warnings": [
            "NM_000109.2:r.4540G>C automapped to NM_000109.2:c.4540G>C",
            "A more recent version of the selected reference sequence NM_000109.2 is available (NM_000109.4): NM_000109.4:c.4540G>C MUST be fully validated prior to use in reports: select_variants=NM_000109.4:c.4540G>C"
        ],
        "variant_exonic_positions": {
            "NC_000023.11": {
                "end_exon": "33",
                "start_exon": "33"
            }
        }
    },
    "flag": "gene_variant",
    "metadata": {
        "variantvalidator_hgvs_version": "2.2.0",
        "variantvalidator_version": "2.2.1.dev530+g520a21a.d20240222",
        "vvdb_version": "vvdb_2023_8",
        "vvseqrepo_db": "VV_SR_2024_01/master",
        "vvta_version": "vvta_2024_01"
    }
}

So this will work

Same for GRCh37. So will close this as it seems to be resolved. Requires server updates.

Could be the new databases though so cannot replicate in dev

If this has only been checked on the dev server, you are perhaps not seeing what I am seeing on the live interactive validator.

  • NM_000109.2:r.4540G>C triggers an ERROR message to the sysadmins and the on-screen warning Unable to validate the submitted variant NM_000109.2:r.4540G>C against the GRCh38 assembly.. It never gets as far as generating the warning This not a valid HGVS description, due to characters being in the wrong case. Please check the use of upper- and lowercase characters. and RNA sequence must be lower-case. The reason is that transcript NM_000109.2 is not in the database used by the interactive service. You can check this using gene2trans.
  • NM_000109.4:r.4540G>C does trigger the warnings about wrong-case letters because `NM_000109.4 is in the database.
  • NM_000109.4:r.4540g>c validates without errors or warnings.

This means that the wrong-case letters warning does already exist in the live interactive validator but that the warning is not triggered because the reference sequence has presumably not been found. However, that situation ought to trigger a "sequence not found" warning.

There might still be more to check.

It wont be. The servers need updarting with the latest versions of the software. I'm not ready for release yet, but hope to do it soon