Question about model's training data
huangnengCSU opened this issue · 2 comments
Hi:
MarginPolish && HELEN is such an excellent pipeline for polishing ONT assembly, which is easy to run and has very high accuracy. I am using the latest model to polishing some human data. I wonder what data do you use to train the model MP_r941_guppy344_human.json
and HELEN_r941_guppy344_human.pkl
. The training datasets of this two models were not mentioned in the paper. Which specie and which chromosome is used, HG002, CHM13 or HG00733 and chr1-6 or chr1-19, chr21-22?
Neng
MP_r941_guppy344_human.json
and HELEN_r941_guppy344_human.pkl
uses the same training that is explained in the paper but basecalled with guppy 3.4.4 model. The underlying training is the same which is HG002 chr1-19.
Just to update you, internally we have switched to a new polisher (https://github.com/kishwarshafin/pepper) that produces better/similar accuracy to MarginPolish-HELEN if your data is guppy3.0.5 or higher.
@kishwarshafin
Thanks so much for your response, I will try your new polisher to generate a more accurate assembly.