KeyError when using cosmic3_ID and cosmic3_DBS
Closed this issue · 3 comments
I'm getting an error when running signatureanalyzer with the cosmic3_ID reference. The full command I run is
signatureanalyzer -n 10 --reference cosmic3_ID --hg_build /data/shared/hg38/hg38.2bit --objective poisson --max_iter 50000 --prior_on_H L1 --prior_on_W L1 -o cosmic3_ID all.maf
The error message is
---------------------------------------------------------
---------- S I G N A T U R E A N A L Y Z E R ----------
---------------------------------------------------------
* Creating output dir at cosmic3_ID
* Using hg38 build
* Using cosmic3_ID signatures
* Loading spectra from all.maf
* Mapping contexts: 2114492 / 2114493Traceback (most recent call last):
File "/home/dylan/signatureanalyzer/getzlab-SignatureAnalyzer/signatureanalyzer/spectra.py", line 252, in get_spectra_from_maf
maf['context83.num'] = maf['context83.word'].apply(context83.__getitem__)
File "/home/dylan/miniconda3/lib/python3.7/site-packages/pandas/core/series.py", line 4357, in apply
return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File "/home/dylan/miniconda3/lib/python3.7/site-packages/pandas/core/apply.py", line 1043, in apply
return self.apply_standard()
File "/home/dylan/miniconda3/lib/python3.7/site-packages/pandas/core/apply.py", line 1102, in apply_standard
convert=self.convert_dtype,
File "pandas/_libs/lib.pyx", line 2859, in pandas._libs.lib.map_infer
KeyError: '2delm2'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/dylan/miniconda3/bin/signatureanalyzer", line 33, in <module>
sys.exit(load_entry_point('signatureanalyzer', 'console_scripts', 'signatureanalyzer')())
File "/home/dylan/signatureanalyzer/getzlab-SignatureAnalyzer/signatureanalyzer/__main__.py", line 197, in main
**vars(args)
File "/home/dylan/signatureanalyzer/getzlab-SignatureAnalyzer/signatureanalyzer/signatureanalyzer.py", line 89, in run_maf
reference=reference
File "/home/dylan/signatureanalyzer/getzlab-SignatureAnalyzer/signatureanalyzer/spectra.py", line 254, in get_spectra_from_maf
raise KeyError('Unusual context: ' + str(e))
KeyError: "Unusual context: '2delm2'"
I get a similar error when using cosmic_DBS as the reference
---------------------------------------------------------
---------- S I G N A T U R E A N A L Y Z E R ----------
---------------------------------------------------------
* Creating output dir at cosmic3_DBS
* Using hg38 build
* Using cosmic3_DBS signatures
* Loading spectra from all.maf
Traceback (most recent call last):
File "/home/dylan/signatureanalyzer/getzlab-SignatureAnalyzer/signatureanalyzer/spectra.py", line 143, in get_spectra_from_maf
maf['context78.num'] = contig.apply(context78.__getitem__)
File "/home/dylan/miniconda3/lib/python3.7/site-packages/pandas/core/series.py", line 4357, in apply
return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File "/home/dylan/miniconda3/lib/python3.7/site-packages/pandas/core/apply.py", line 1043, in apply
return self.apply_standard()
File "/home/dylan/miniconda3/lib/python3.7/site-packages/pandas/core/apply.py", line 1102, in apply_standard
convert=self.convert_dtype,
File "pandas/_libs/lib.pyx", line 2859, in pandas._libs.lib.map_infer
KeyError: 'GC>GG'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/dylan/miniconda3/bin/signatureanalyzer", line 33, in <module>
sys.exit(load_entry_point('signatureanalyzer', 'console_scripts', 'signatureanalyzer')())
File "/home/dylan/signatureanalyzer/getzlab-SignatureAnalyzer/signatureanalyzer/__main__.py", line 197, in main
**vars(args)
File "/home/dylan/signatureanalyzer/getzlab-SignatureAnalyzer/signatureanalyzer/signatureanalyzer.py", line 89, in run_maf
reference=reference
File "/home/dylan/signatureanalyzer/getzlab-SignatureAnalyzer/signatureanalyzer/spectra.py", line 145, in get_spectra_from_maf
raise KeyError('Unusual context: ' + str(e))
KeyError: "Unusual context: 'GC>GG'"
It seems that the algorithm used to generate the contig string can produce contigs not found in context83. This is a maf file that reproduces this error when using cosmic3_ID as the reference.
Chromosome | Start_Position | Variant_Type | Reference_Allele | Tumor_Seq_Allele2 | Tumor_Sample_Barcode |
---|---|---|---|---|---|
chr10 | 15119471 | DEL | TA | - | MF12 |
Hi @d-henness Sorry we missed this. As the error suggest, one of your mutations is labeled as a DSB despite being GC > GG which is a single base substitution instead. The Indel error occurs because your mutation is not left aligned. Looking at the sequence, we see that the reference is TTGATATCTTT
and your MAF shows a deletion of TA. That means the read had the sequence TTGATCTTT
. A left-aligned annotation would actually indicate a deletion of AT
from position 15119470. These two mutations are parsimonious, but the AT
deletion would be left-aligned. Hope this helps for any issues in the future.
Best regards,
Yo Akiyama
Thanks for your reply @yoakiyama. Can you recommend any tools that will convert mutations into their left aligned forms? Also I had a look though the documentation for SignatureAnalyzer and didn't notice this requirement anywhere. It should be stated somewhere if this is a requirement.