getzlab/SignatureAnalyzer

KeyError when using cosmic3_ID and cosmic3_DBS

Closed this issue · 3 comments

I'm getting an error when running signatureanalyzer with the cosmic3_ID reference. The full command I run is
signatureanalyzer -n 10 --reference cosmic3_ID --hg_build /data/shared/hg38/hg38.2bit --objective poisson --max_iter 50000 --prior_on_H L1 --prior_on_W L1 -o cosmic3_ID all.maf

The error message is

---------------------------------------------------------
---------- S I G N A T U R E  A N A L Y Z E R  ----------
---------------------------------------------------------
   * Creating output dir at cosmic3_ID
   * Using hg38 build
   * Using cosmic3_ID signatures
   * Loading spectra from all.maf
      * Mapping contexts: 2114492 / 2114493Traceback (most recent call last):
  File "/home/dylan/signatureanalyzer/getzlab-SignatureAnalyzer/signatureanalyzer/spectra.py", line 252, in get_spectra_from_maf
    maf['context83.num'] = maf['context83.word'].apply(context83.__getitem__)
  File "/home/dylan/miniconda3/lib/python3.7/site-packages/pandas/core/series.py", line 4357, in apply
    return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
  File "/home/dylan/miniconda3/lib/python3.7/site-packages/pandas/core/apply.py", line 1043, in apply
    return self.apply_standard()
  File "/home/dylan/miniconda3/lib/python3.7/site-packages/pandas/core/apply.py", line 1102, in apply_standard
    convert=self.convert_dtype,
  File "pandas/_libs/lib.pyx", line 2859, in pandas._libs.lib.map_infer
KeyError: '2delm2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dylan/miniconda3/bin/signatureanalyzer", line 33, in <module>
    sys.exit(load_entry_point('signatureanalyzer', 'console_scripts', 'signatureanalyzer')())
  File "/home/dylan/signatureanalyzer/getzlab-SignatureAnalyzer/signatureanalyzer/__main__.py", line 197, in main
    **vars(args)
  File "/home/dylan/signatureanalyzer/getzlab-SignatureAnalyzer/signatureanalyzer/signatureanalyzer.py", line 89, in run_maf
    reference=reference
  File "/home/dylan/signatureanalyzer/getzlab-SignatureAnalyzer/signatureanalyzer/spectra.py", line 254, in get_spectra_from_maf
    raise KeyError('Unusual context: ' + str(e))
KeyError: "Unusual context: '2delm2'"

I get a similar error when using cosmic_DBS as the reference

---------------------------------------------------------
---------- S I G N A T U R E  A N A L Y Z E R  ----------
---------------------------------------------------------
   * Creating output dir at cosmic3_DBS
   * Using hg38 build
   * Using cosmic3_DBS signatures
   * Loading spectra from all.maf
Traceback (most recent call last):
  File "/home/dylan/signatureanalyzer/getzlab-SignatureAnalyzer/signatureanalyzer/spectra.py", line 143, in get_spectra_from_maf
    maf['context78.num'] = contig.apply(context78.__getitem__)
  File "/home/dylan/miniconda3/lib/python3.7/site-packages/pandas/core/series.py", line 4357, in apply
    return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
  File "/home/dylan/miniconda3/lib/python3.7/site-packages/pandas/core/apply.py", line 1043, in apply
    return self.apply_standard()
  File "/home/dylan/miniconda3/lib/python3.7/site-packages/pandas/core/apply.py", line 1102, in apply_standard
    convert=self.convert_dtype,
  File "pandas/_libs/lib.pyx", line 2859, in pandas._libs.lib.map_infer
KeyError: 'GC>GG'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dylan/miniconda3/bin/signatureanalyzer", line 33, in <module>
    sys.exit(load_entry_point('signatureanalyzer', 'console_scripts', 'signatureanalyzer')())
  File "/home/dylan/signatureanalyzer/getzlab-SignatureAnalyzer/signatureanalyzer/__main__.py", line 197, in main
    **vars(args)
  File "/home/dylan/signatureanalyzer/getzlab-SignatureAnalyzer/signatureanalyzer/signatureanalyzer.py", line 89, in run_maf
    reference=reference
  File "/home/dylan/signatureanalyzer/getzlab-SignatureAnalyzer/signatureanalyzer/spectra.py", line 145, in get_spectra_from_maf
    raise KeyError('Unusual context: ' + str(e))
KeyError: "Unusual context: 'GC>GG'"

It seems that the algorithm used to generate the contig string can produce contigs not found in context83. This is a maf file that reproduces this error when using cosmic3_ID as the reference.

Chromosome Start_Position Variant_Type Reference_Allele Tumor_Seq_Allele2 Tumor_Sample_Barcode
chr10 15119471 DEL TA - MF12

Hi @d-henness Sorry we missed this. As the error suggest, one of your mutations is labeled as a DSB despite being GC > GG which is a single base substitution instead. The Indel error occurs because your mutation is not left aligned. Looking at the sequence, we see that the reference is TTGATATCTTT and your MAF shows a deletion of TA. That means the read had the sequence TTGATCTTT. A left-aligned annotation would actually indicate a deletion of AT from position 15119470. These two mutations are parsimonious, but the AT deletion would be left-aligned. Hope this helps for any issues in the future.

Best regards,
Yo Akiyama

Thanks for your reply @yoakiyama. Can you recommend any tools that will convert mutations into their left aligned forms? Also I had a look though the documentation for SignatureAnalyzer and didn't notice this requirement anywhere. It should be stated somewhere if this is a requirement.