struct.error
Closed this issue · 7 comments
There is something wrong about the "struct.error: 'i' format requires -2147483648 <= number <= 2147483647" and the recent error please see error.txt
error.txt
this is my code:
isONclust --ont --fastq ../../00.all_length/reads_fitered.fq --outfolder out --t 16 1>log.txt 2>error
log.txt
in filedir "out" is sorted.fastq and logfile.txt
Could you please help me to solve the error?
This is an error related to python multiprocessing and the amount of data that can be sent between processes. I believe they fixed this limitation in python 3.8 (ref here).
Therefore, could you try running/reinstalling isONclust with python 3.8. It should be the automatic version that conda uses when running conda create -n isonclust python=3 pip
otherwise I believe conda create -n isonclust python=3.8 pip
forces the version to 3.8
How many reads is your dataset consist of?
Ok, I will try to use python 3.8, the number of read is 11347221, is it too many?
Good, let me know how it goes!
It depends on how well they cluster. I have clustered a 10M reads dataset in 1-2days with 32-50 cores, but that was a pre-processed dataset where all "spurious reads" (the ones not from transcripts such as intron-priming reads) were filtered out based on a reference. I would recommend using as many cores possible on your node (if you have more than 16).
Two additional tips:
- Include parameter
--d 200000
this will omit printing log information to stderr every 10k read (will instead print every 200k read). This is another possible reason to what could have "clogged the pipe" between multiprocessing and caused the error (but I'm not sure). At any rate, the log information is not useful at this point and should be reduced. - You can run with parameter
--use_old_sorted_file
this will save time from not having to rerun the sorting step in the beginning (looks like it took almost 9 hours something)
Thank you, I'm running now with 16 cores and the max memory of the server is 380G, is it enough to run?
If 16 is the most that you have, I guess you are bound to it. Otherwise I would recommend using more cores (as many at you have on the node). The number of cores doesn't have to be a multiple of 2. The total memory should not be an issue.
A final comment on heuristics if none of; (1) using python3.8, (2) more cores,(3) reduce printing to stderr solves it. The error happened in the last batch which contains 2,172,864 reads (they are at the bottom of the sorted.fastq
file in the isONclust output folder) and are typically the shortest and/or worst quality reads. A subset of these reads could be removed or processed individually and it should work.
inferred from this line in the log.txt: Nr reads in batches: [202520, 276520, 331199, 383795, 435138, 482185, 523946, 569871, 621579, 684407, 761473, 846284, 943406, 1067587, 1266904, 2172864]
Thank you, the suggestion works. Final code is (on python3.8):
isONclust --ont --fastq reads_fitered.onlyAll.fq --outfolder out--t 32 --d 200000 --use_old_sorted_file