nf-core/hlatyping

Unclear how to build files for base_index_path

Opened this issue · 0 comments

I've tried to use a custom reference file. I have a file in the fasta format. I used Yara tool (http://packages.seqan.de/yara) to repack the fasta file into a supported format. My command was something like

yara_indexer REF.fasta.gz

It seemed to produce valid files, similar to the files in your data/indices/yara directory. Unfortunately after running the pipeline if failed with the following error:

Error executing process > 'run_optitype (26)'
Caused by:
  Process `run_optitype (26)` terminated with an error exit status (1)
Command executed:
  OptiTypePipeline.py -i mapped_1.bam mapped_2.bam -e 1 -b 0.009 \
      -p "BI-VIE-0000-0000-0005-1606" -c config.ini --rna --outdir BI-VIE-0000-0000-0005-1606
Command exit status:
  1
Command output:
  (empty)
Command error:
  [E::idx_find_and_load] Could not retrieve index file for 'mapped_1.bam'
  [E::idx_find_and_load] Could not retrieve index file for 'mapped_2.bam'
  Traceback (most recent call last):
    File "/opt/conda/envs/nf-core-hlatyping-1.2.1dev/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2889, in get_loc
      return self._engine.get_loc(casted_key)
    File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/index.pyx", line 97, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
    File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
  KeyError: 'HLA:HLA00001'
  The above exception was the direct cause of the following exception:
  Traceback (most recent call last):
    File "/opt/conda/envs/nf-core-hlatyping-1.2.1dev/bin/OptiTypePipeline.py", line 366, in <module>
      alleles_to_keep = list(filter(is_frequent, binary.columns))
    File "/opt/conda/envs/nf-core-hlatyping-1.2.1dev/bin/OptiTypePipeline.py", line 142, in is_frequent
      return table.loc[allele_id]['4digit'] in freq_alleles and table.loc[allele_id]['flags'] == 0 or (table.loc[allele_id]['locus'] in 'HGJ')
    File "/opt/conda/envs/nf-core-hlatyping-1.2.1dev/lib/python3.6/site-packages/pandas/core/indexing.py", line 879, in __getitem__
      return self._getitem_axis(maybe_callable, axis=axis)
    File "/opt/conda/envs/nf-core-hlatyping-1.2.1dev/lib/python3.6/site-packages/pandas/core/indexing.py", line 1110, in _getitem_axis
      return self._get_label(key, axis=axis)
    File "/opt/conda/envs/nf-core-hlatyping-1.2.1dev/lib/python3.6/site-packages/pandas/core/indexing.py", line 1059, in _get_label
      return self.obj.xs(label, axis=axis)
    File "/opt/conda/envs/nf-core-hlatyping-1.2.1dev/lib/python3.6/site-packages/pandas/core/generic.py", line 3482, in xs
      loc = self.index.get_loc(key)
    File "/opt/conda/envs/nf-core-hlatyping-1.2.1dev/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2891, in get_loc
      raise KeyError(key) from err
  KeyError: 'HLA:HLA00001'

At the same time the original files in the data/indices/yara directory worked well for the same samples. It is clear that the problem here is with the reference.

@apeltzer told me that it is possible that you have manually changed the files and that their format is not a direct output from the yara. If that is true, then I would like to ask for an instruction on how the file should be prepared.