gene without cDNA sequence causes error when running quant.py
adeschen opened this issue · 0 comments
Hi,
The quantification step is crashing for some of our samples. The error is always the same:
Traceback (most recent call last):
File "/grid/tuveson/home/software_elzar/bin/miniconda/miniconda3_v4.12.0/miniconda3/envs/arcasHLA-v0.5.0_db3.47.0/share/arcas-hla-0.5.0-0/scripts/quant.py", line 250, in <module>
allele_results[allele_id[:-1]]['allele' + allele_id[-1]] = allele
There is an easy fix to this problem. The current quant.py core (line 250):
for allele_id, allele in genotype.items():
allele_results[allele_id[:-1]]['allele' + allele_id[-1]] = allele
can be modified to check if the key exists and if not, create one:
for allele_id, allele in genotype.items():
if not allele_id[:-1] in allele_results:
allele_results[allele_id[:-1]] = defaultdict(float)
allele_results[allele_id[:-1]]['allele' + allele_id[-1]] = allele
That modification removes the error but don't remove the cause of the problem. The quant.py script uses the .p file to populate the allele_results
dictionary. The .p contains both the genes
and the genotype
dictionaries. The genes
dictionary is used by the quant.py script to create the keys of the allele_results
dictionary. However, it appears that the genes
dictionary is something missing genes that have alleles in the genotype
dictionary.
It seams the cause of this problem is related to the lack of cDNA sequence for some allele as managed in the customize.py script. The script generates the .p file that is going to be used by the quant.py script. Both the genes
and genotype
dictionaries are saved in the .p file. However, the genes
dictionary won't contains the genes that are missing cDNA sequences for both alleles (line 137 of customize.py) as the condition for the loop is not respected:
for seq in sequences:
hla_idx[allele_id].append(str(idx))
genes[gene].append(allele_id)
Can you address this issue.
Best regards
Astrid