nanoporetech/rerio

Rerio training model materials

PanZiwei opened this issue · 2 comments

Hi,
Is it possible to provide more details about the training datasets used for res_dna_r941_min_modbases_5mC_CpG_v001 and res_dna_r941_min_modbases_5mC_v001? Are they trained with native human data, E.coli data or synthetic samples?

Also, is there any plan to release the training datasets for the Rerio model in the repo in the future for research purpose?

Thank you so much for your help!

Apologies for the late response to this issue. The mentioned models were trained with a combination of synthetically modified human data (PCR and M.SssI) as well as native human data with matched bisulfite data.

The more recent Remora models have greatly simplified the training process for the 5mC CpG model which now is trained from only synthetically modified human reads (though Remora model performance is not effected much by the source material as only 10 bases of signal are presented to the model at a time).

We are looking into options for releasing the Remora training datasets.

@marcus1487
Thanks for the reply. So you are planning to replace rerio models with the latest Remora models in the near future right?