Unclear how to build files for base_index_path
Opened this issue · 0 comments
I've tried to use a custom reference file. I have a file in the fasta format. I used Yara tool (http://packages.seqan.de/yara) to repack the fasta file into a supported format. My command was something like
yara_indexer REF.fasta.gz
It seemed to produce valid files, similar to the files in your data/indices/yara directory. Unfortunately after running the pipeline if failed with the following error:
Error executing process > 'run_optitype (26)'
Caused by:
Process `run_optitype (26)` terminated with an error exit status (1)
Command executed:
OptiTypePipeline.py -i mapped_1.bam mapped_2.bam -e 1 -b 0.009 \
-p "BI-VIE-0000-0000-0005-1606" -c config.ini --rna --outdir BI-VIE-0000-0000-0005-1606
Command exit status:
1
Command output:
(empty)
Command error:
[E::idx_find_and_load] Could not retrieve index file for 'mapped_1.bam'
[E::idx_find_and_load] Could not retrieve index file for 'mapped_2.bam'
Traceback (most recent call last):
File "/opt/conda/envs/nf-core-hlatyping-1.2.1dev/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2889, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 97, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'HLA:HLA00001'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/envs/nf-core-hlatyping-1.2.1dev/bin/OptiTypePipeline.py", line 366, in <module>
alleles_to_keep = list(filter(is_frequent, binary.columns))
File "/opt/conda/envs/nf-core-hlatyping-1.2.1dev/bin/OptiTypePipeline.py", line 142, in is_frequent
return table.loc[allele_id]['4digit'] in freq_alleles and table.loc[allele_id]['flags'] == 0 or (table.loc[allele_id]['locus'] in 'HGJ')
File "/opt/conda/envs/nf-core-hlatyping-1.2.1dev/lib/python3.6/site-packages/pandas/core/indexing.py", line 879, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "/opt/conda/envs/nf-core-hlatyping-1.2.1dev/lib/python3.6/site-packages/pandas/core/indexing.py", line 1110, in _getitem_axis
return self._get_label(key, axis=axis)
File "/opt/conda/envs/nf-core-hlatyping-1.2.1dev/lib/python3.6/site-packages/pandas/core/indexing.py", line 1059, in _get_label
return self.obj.xs(label, axis=axis)
File "/opt/conda/envs/nf-core-hlatyping-1.2.1dev/lib/python3.6/site-packages/pandas/core/generic.py", line 3482, in xs
loc = self.index.get_loc(key)
File "/opt/conda/envs/nf-core-hlatyping-1.2.1dev/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2891, in get_loc
raise KeyError(key) from err
KeyError: 'HLA:HLA00001'
At the same time the original files in the data/indices/yara directory worked well for the same samples. It is clear that the problem here is with the reference.
@apeltzer told me that it is possible that you have manually changed the files and that their format is not a direct output from the yara. If that is true, then I would like to ask for an instruction on how the file should be prepared.