'RuntimeError: Error: input data lengths do not all match.' in the CalcProfile step
eumhh opened this issue · 5 comments
Hi,
I'm trying to run shapemapper on Ubuntu 20.04.
Sometimes, not everytime, it returns a RuntimeError in the 'CalcProlife' step.
Here is the code I used and the ERROR message:
shapemapper-2.1.3/shapemapper --name sample --target /home/user/reference/myfasta.fasta --out sample --modified --R1 sample2-2_1.fq.gz --R2 sample2-2_2.fq.gz --output-aligned --random-primer-len 9
Running CalcProfile at 2022-08-23 11:23:50 . . .
ERROR: Component "CalcProfile" (RNA: sample) failed, giving the following error message:=======
Traceback (most recent call last):
File "/home/user/shapemapper-2.1.3/python/pyshapemap/../../bin/make_reactivity_profiles.py", line 396, in
raise RuntimeError(s)
RuntimeError: Error: input data lengths do not all match.
In what cases does this error return?
Thanks,
Eum
I'm not sure what is causing your error, but it looks like you are using an old version (2.1.3). First, update to the newest version and determine if this issue is still occurring.
I found the cause and fixed it. There was a space in the fasta header ;(
After removing the space, the ShapeMapper execution completed successfully.
So glad you found the issue! It is frustrating when things fail due to a tiny typo. Could you provide more info about what happened? I might be able to make this error more informative for future users.
A fasta file with space-starting header returned the RuntimeError.
> Test [there is a space at the beginning of a header line]
GATATCGAATTCGGGCAACCTAATACGACTCACTATAGGGACATTTGCTTCTGACACAACT
ERROR: Component "CalcProfile" (RNA: sample) failed, giving the following error message:=======
Traceback (most recent call last):
File "/home/user/shapemapper-2.1.3/python/pyshapemap/../../bin/make_reactivity_profiles.py", line 396, in
raise RuntimeError(s)
RuntimeError: Error: input data lengths do not all match.
\
After checking the lines 384~396 in 'make_reactivity_profiles.py', I added a line to figure out the lengths of the inputs.
# check that seq length matches all mutation count data length and depth length
lengths = []
lengths.append(len(seq))
for k in samples:
if counts[k] is not None:
lengths.append(counts[k].shape[0])
if read_depths[k] is not None:
lengths.append(read_depths[k].shape[0])
if effective_depths[k] is not None:
lengths.append(effective_depths[k].shape[0])
print(lengths) ####### added line to figure out the lengths of the inputs
if len(set(lengths)) > 1:
s = "Error: input data lengths do not all match."
raise RuntimeError(s)
As a result, the length of fasta seq was 0.
We couldn't suspect fasta file because there were no problems in the alignment steps.
But removing space in a header line fixed the error.
>Test [No space at the beginning of a header line]
GATATCGAATTCGGGCAACCTAATACGACTCACTATAGGGACATTTGCTTCTGACACAACT
I recommend that we need to double-check input format before use.
Thanks,
Eum