ValueError("All arrays must be of the same length")
smilenaderi opened this issue · 4 comments
Bug Description
I tried to run it on the following fasta file it gives me this error:
>seq-2
MKKKKKKKLKKLKKKLKKKLKKKKKLLLLLLLLKKKKKKK
>seq-9
MKKKIKKIKKKIEKKKKKKLKKLKKKKKKKKLLLLLLLLL
>seq-10
MSEKFSEIAEKYDEERILSRSAGELAELTRELGLKPGDRVLDVGCGTGYLTLPLAERVGPEGTVIGIDRSEEMLARARERAAAAGLSNVEFQVADAEALPFPDESFDLVTCRLVLHHLPDPAKALREMRRVLKPGGRFVVSDWDASSMAFPDEEAELAERLRRYAEARAAAGGERDALRRALEAAGFRDVTVRSLTAWRRRAGEAAAAAL
>seq-13
MKKKKKLKKKLKKKKKKKK
Runtime Environment
Fresh install of requirements
Logs
annopro -i test_proteins.fasta -o output-test
Download cafa4.dmnd...
100% [........................................................................] 46988123 / 46988123
Validate md5sum of cafa4.dmnd...
diamond v2.1.0.154 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)
#CPU threads: 4
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: output-test
#Target sequences to report alignments for: 25
Opening the database... [0.042s]
Database: /home/ubuntu/.annopro/data/cafa4.dmnd (type: Diamond database, sequences: 87514, letters: 44798577)
Block size = 2000000000
Opening the input file... [0s]
Opening the output file... [0s]
Loading query sequences... [0s]
Masking queries... [0.001s]
Algorithm: Double-indexed
Building query histograms... [0s]
Loading reference sequences... [0.055s]
Masking reference... [0.588s]
Initializing temporary storage... [0s]
Building reference histograms... [0.493s]
Allocating buffers... [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 1/4.
Building reference seed array... [0.163s]
Building query seed array... [0s]
Computing hash join... [0.004s]
Masking low complexity seeds... [0s]
Searching alignments... [0s]
Deallocating memory... [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 2/4.
Building reference seed array... [0.192s]
Building query seed array... [0s]
Computing hash join... [0.002s]
Masking low complexity seeds... [0s]
Searching alignments... [0s]
Deallocating memory... [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 3/4.
Building reference seed array... [0.213s]
Building query seed array... [0s]
Computing hash join... [0.003s]
Masking low complexity seeds... [0s]
Searching alignments... [0s]
Deallocating memory... [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 4/4.
Building reference seed array... [0.154s]
Building query seed array... [0s]
Computing hash join... [0.003s]
Masking low complexity seeds... [0s]
Searching alignments... [0s]
Deallocating memory... [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 1/4.
Building reference seed array... [0.155s]
Building query seed array... [0s]
Computing hash join... [0.003s]
Masking low complexity seeds... [0s]
Searching alignments... [0s]
Deallocating memory... [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 2/4.
Building reference seed array... [0.19s]
Building query seed array... [0s]
Computing hash join... [0.003s]
Masking low complexity seeds... [0s]
Searching alignments... [0s]
Deallocating memory... [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 3/4.
Building reference seed array... [0.211s]
Building query seed array... [0s]
Computing hash join... [0.002s]
Masking low complexity seeds... [0s]
Searching alignments... [0s]
Deallocating memory... [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 4/4.
Building reference seed array... [0.154s]
Building query seed array... [0s]
Computing hash join... [0.004s]
Masking low complexity seeds... [0s]
Searching alignments... [0s]
Deallocating memory... [0s]
Deallocating buffers... [0.004s]
Clearing query masking... [0s]
Computing alignments... Loading trace points... [0.001s]
Sorting trace points... [0s]
Computing alignments... [0s]
Deallocating buffers... [0s]
Loading trace points... [0s]
[0.002s]
Deallocating reference... [0.002s]
Loading reference sequences... [0s]
Deallocating buffers... [0s]
Deallocating queries... [0s]
Loading query sequences... [0s]
Closing the input file... [0s]
Closing the output file... [0s]
Closing the database... [0.002s]
Cleaning up... [0s]
Total time = 2.766s
Reported 21 pairwise alignments, 21 HSPs.
1 queries aligned.
Invalid feature 0.6934-309 for seq-13 at line 596
Invalid feature 0.6934-309 for seq-13 at line 596
Invalid feature 0.5127-315 for seq-13 at line 596
Invalid feature 0.5127-315 for seq-13 at line 596
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/annopro/bin/annopro", line 8, in <module>
sys.exit(console_main())
File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/annopro/__init__.py", line 27, in console_main
main(
File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/annopro/__init__.py", line 71, in main
process(
File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/annopro/data_procession/__init__.py", line 8, in process
data = Data_process(protein_file=profeat_file,
File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/annopro/data_procession/data_predict.py", line 36, in __init__
self.__data__()
File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/annopro/data_procession/data_predict.py", line 39, in __data__
proteins_f = profeat_to_df(self.protein_file)
File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/profeat/__init__.py", line 69, in profeat_to_df
return pd.DataFrame(feature_list).T
File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/pandas/core/frame.py", line 636, in __init__
mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 502, in dict_to_mgr
return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 120, in arrays_to_mgr
index = _extract_index(arrays)
File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 674, in _extract_index
raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length
The error is likely due to a problem with profeat when calculating protein features, possibly because profeat cannot recognize your input sequence. If it is convenient for you, please provide us with the complete sequence file for analysis or use our website: https://idrblab.org/annopro
We recently reproduced the same bug during testing, and found that there were multiple protein sequences with the same ID. Perhaps you have encountered a similar problem and can investigate it.
The error is likely due to a problem with profeat when calculating protein features, possibly because profeat cannot recognize your input sequence. If it is convenient for you, please provide us with the complete sequence file for analysis or use our website: https://idrblab.org/annopro
I run with the data from https://idrblab.org/annopro, but the problem still exists: ValueError: All arrays must be of the same length
The error is likely due to a problem with profeat when calculating protein features, possibly because profeat cannot recognize your input sequence. If it is convenient for you, please provide us with the complete sequence file for analysis or use our website: https://idrblab.org/annopro
I run with the data from https://idrblab.org/annopro, but the problem still exists: ValueError: All arrays must be of the same length
This problem is caused by the amino acid sequence length being less than 30 during profeat.