roseorenbuch/arcasHLA-quant

Problems running `customize`: cannot create dummy HLA dict?

Opened this issue · 1 comments

I have no issue running extract and genotype as part of my Snakemake pipeline. I am now testing adding quant to it.

My testing environment is a simple mamba env with only arcasHLA and its requirements. git-lfs is installed and up to date, accessible everywhere including the environment.

I got this error when trying to run customize according to how it's suggested in #6 .

$arcasHLA customize --transcriptome chr6 --genotype A0508148.genotype.json -o ./ref
None
{<genotyped output removed for privacy>}
Traceback (most recent call last):
  File "/home/a.vliet/miniconda3/envs/arcas/share/arcas-hla-0.5.0-3/scripts/customize.py", line 317, in <module>
    build_custom_reference(subject, genotype, args.grouping, args.transcriptome, temp)
  File "/home/a.vliet/miniconda3/envs/arcas/share/arcas-hla-0.5.0-3/scripts/customize.py", line 92, in build_custom_reference
    transcriptome.append(dummy_HLA_dict[transcript])

I went ahead and took a look at your code and what's going on at this line.
I ran the offending line from the error manually:

$ python
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:21) 
[GCC 9.4.0] on linux

$ from Bio import SeqIO
$ dummy_hla_fa = "/home/a.vliet/miniconda3/envs/arcas/share/arcas-hla-0.5.0-3/dat/ref/GRCh38.chr6.HLA.fasta"
$ dummy_HLA_dict = SeqIO.to_dict(SeqIO.parse(dummy_hla_fa, 'fasta')) 
$ dummy_HLA_dict
{}

So only an empty dictionary is returned. Maybe it's an issue with git-lfs but as I said I made sure to check that it's installed and working, and I had no issue genotyping.

The fasta file in question looks like this:

$ cat ~/miniconda3/envs/arcas/share/arcas-hla-0.5.0-3/dat/ref/GRCh38.chr6.HLA.fasta 
version https://git-lfs.github.com/spec/v1
oid sha256:b0df533500a0b7418214bb36a05368155148415f1a557f0a05dde78afb248b00
size 1693517

Any thoughts on what could be going wrong?