kage-genotyper/kage

kmer_mapper not working

teepean opened this issue · 3 comments

I am trying to test kage 0.11.14 but cannot for some reason kmer_mapper does not work. The command and error message:

kmer_mapper map -b index_2548all_uncompressed.npz -f 210435_S26_L001_R1_001.fq -o kmer_counts
Traceback (most recent call last):
File "/home/useri/.local/bin/kmer_mapper", line 8, in
sys.exit(main())
File "/home/useri/.local/lib/python3.10/site-packages/kmer_mapper/command_line_interface.py", line 29, in main
run_argument_parser(sys.argv[1:])
File "/home/useri/.local/lib/python3.10/site-packages/kmer_mapper/command_line_interface.py", line 169, in run_argument_parser
args.func(args)
File "/home/useri/.local/lib/python3.10/site-packages/kmer_mapper/command_line_interface.py", line 85, in map_bnp
kmer_index = _get_kmer_index_from_args(args)
File "/home/useri/.local/lib/python3.10/site-packages/kmer_mapper/util.py", line 51, in _get_kmer_index_from_args
kmer_index = IndexBundle.from_file(args.index_bundle).indexes["KmerIndex"]
File "/home/useri/.local/lib/python3.10/site-packages/kage/indexing/index_bundle.py", line 20, in getitem
return self.index[e]
KeyError: 'KmerIndex'

ivargr commented

Hi!

Sorry, there seems to have been a mismatch in the code after I've updated some indexes. I pushed a fix now, so it should be fixed in latest version of kmer_mapper. Could you try updating kmer_mapper and see if it works then? You will need version 0.0.31, so pip install kmer_mapper==0.0.31 should fix it, hopefully.

Thanks for bringing this to my attention :)

Thanks! Looks like it started processing but pops up an error message every now and then:

Process Process-13:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/useri/.local/lib/python3.10/site-packages/shared_memory_wrapper/shared_array_map_reduce.py", line 51, in call
job_result = self.function(*data, run_specific_data)
File "/home/useri/.local/lib/python3.10/site-packages/kmer_mapper/command_line_interface.py", line 39, in map_cpu
hashes = get_kmer_hashes_from_chunk_sequence(chunk_sequence, kmer_size)
File "/home/useri/.local/lib/python3.10/site-packages/kmer_mapper/util.py", line 73, in get_kmer_hashes_from_chunk_sequence
bnp.as_encoded_array(chunk_sequence, bnp.DNAEncoding), kmer_size).ravel().raw().astype(np.uint64)
File "/home/useri/.local/lib/python3.10/site-packages/bionumpy/encoded_array.py", line 542, in as_encoded_array
return target_encoding.encode(s)
File "/home/useri/.local/lib/python3.10/site-packages/bionumpy/encoded_array.py", line 52, in encode
r = self._ragged_array_as_encoded_array(data)
File "/home/useri/.local/lib/python3.10/site-packages/bionumpy/encoded_array.py", line 73, in _ragged_array_as_encoded_array
data = self.encode(s.ravel())
File "/home/useri/.local/lib/python3.10/site-packages/bionumpy/encoded_array.py", line 59, in encode
out = EncodedArray(self._encode(data), self)
File "/home/useri/.local/lib/python3.10/site-packages/bionumpy/encodings/alphabet_encoding.py", line 33, in _encode
raise EncodingError(f"Error when encoding {''.join(chr(c) for c in byte_array.ravel()[0:100])} "
bionumpy.encodings.exceptions.EncodingError: ("Error when encoding TTACCTCAAGGTTATCGACGTGCAGGGAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCCAACCTATCTCGTATGCCGTCTTCTGCTTGAAAATGG to AlphabetEncoding. Invalid character(s): ['N'][78]", 919813)

ivargr commented

Sorry for the late reply!

This error message should have been clearer, but this is because you have Ns in your sequences. Am I correct that some of your reads have Ns in them?