nanoporetech/rerio

methylation CpG model vs all-context model

PanZiwei opened this issue · 3 comments

Hi,
I found there are two models relevant to 5mC: res_dna_r941_min_modbases_5mC_CpG_v001 and res_dna_r941_min_modbases_5mC_v001. So the former one is trained on CpG only, and the latter one is trained in all-context? Do they have any model performance difference? Which one is better if I am interested in detecting human CpG methylation?

Currently, I am using Megalodon with res_dna_r941_min_modbases_5mC_v001 for base modification identification, how can filter out non-CG pattern in the results? Is it possible to provide Megalodon with options --alternate-bases CpG or --alternate-bases 5mC like Tombo?

The res_dna_r941_min_modbases_5mC_v001 all-context model is the recommended model. The res_dna_r941_min_modbases_5mC_CpG_v001 model is still available for use, but it not listed in the summary table in the README as it is no longer recommended for use. The newer res_dna_r941_min_modbases_5mC_v001 model performs well in CH contexts and performs the CpG model is CG contexts.

For filtering to CG results megalodon provides the --mod-motif argument (for CG contexts set to --mod-motif m CG 0). See further documentation here and on the command line using the megalodon -h command.

@marcus1487 ,

Wait...I think it should be typos? res_dna_r941_min_modbases_5mC_v001 should be all-context model and it is the recommended model, not the res_dna_r941_min_modbases_5mC_CpG_v001?

Yes. You are correct. Apologies. I will amend my previous comment to show the correct models. Good spot!