tszalay/poreseq

Error Correction with multiple dataset

Opened this issue · 2 comments

Dear Tamas,

I am thinking of doing a denovo assembly of E.coli using the assembler CANU. As setting up a pipeline, I came to think of merging CANU, with poreseq for error correcting the assembly.

I am taking the 4 datasets and combining into one Highquality2D dataset and running the Assembly. So basically I am merging 4 datasets. Can you please enlighten me whether should I error correct the assembled data(after assembly) with each data set individually, and use the error corrected data as the input for the next error correction and continue 4 times the error correction? Or is it possible to give the path to all the 4 data sets of fast5 folders and do it one shot.

I hope you got my idea.

Thanks in advance.
Athul

Hi Athul,

You should have no problem generating the assembly using all of the data and then using all 4 datasets to error correct in one pass. If you have problems, however, you may wish to try nanopolish instead - I believe they have done a better job keeping up with the changes to Oxford's format and files than I have.

-Tamas

Hi Tamas,

Thank you for your suggestion. Will try both the tools.

Athul