Process secondary_nt_calc_lca failed

Question

Process secondary_nt_calc_lca failed

pengouy opened this issue 5 months ago · 1 comments

Hi, I encontered with a new error in running hecatomb v1.3.2, here is the related log information:

15:23:41.923 [WARN] taxid 1965493 was merged into 3070820
15:23:41.923 [WARN] taxid 45219 was merged into 3052307
15:23:41.924 [ERRO] bufio.Scanner: token too long
15:23:42.228 [ERRO] lineage-field (4) out of range (3):ZS32:1:1.654e-04:204011

The resource "time" of this step was setted as "sml". When I first met with this error, I thought it was caused by the insufficient time setting, becuase I found it needed more than 1 hour to finish this step with my own data, so I changed the "sml" to "ram" in read_annotation.smk file but it still not working. Please check this issue.

Answer 1 · 2024-07-10T02:02:34.000Z

It seems that this error occured during excuting taxonkit in a mmseq2 environment, and I found some related error information about taxonkit in [https://github.com/shenwei356/taxonkit/issues/75]. The auther updated this part of code in a higher version. Considering it is a time consuming process to re-install the hecatomb and it may have a conflict with other dependencies, so I did not try a verion of taxonkit >0.8.0.
Now I have solved this problem with following steps:

Modifying the original code of taxonkit v0.8.0 by adding a "buffer-size" flag in lca.go file to increase the buffer space, refering to the latest version;
Compiling the code and the replace the binary file with it in this catalogue anaconda3/envs/hecatomb/lib/python3.11/site-packages/hecatomb/snakemake/conda/1d30c962f1392466f06c4a5792ffe366_/bin
Re-run the job and no [ERRO] bufio.Scanner occured.