leylabmpi/resmico

How to train ResMiCo on my own dataset?

Closed this issue · 4 comments

Hi, there,

Many thanks for your great work, I wonder how I can train your model on my own dataset. Specifically, how to give the label to each contig? And I notice that DeepMAsED is also the work of your team, may I know that how to train that model too?

Looking forward to your reply, thanks!

We used simulated data a ground truth for training, since for simulations, we know all of the misassemblies. You can use the simulation dataset that we used for the paper, and you can simulate more data -- possibly new simulated data that better matches your real dataset.

See https://github.com/leylabmpi/resmico/tree/master/ResMiCo-SM for info on how to perform the simulations.

I have simulated dataset by myself and collected the label file from metaQUAST, but I don't know how to integrate the label file to train your model. Do I have to follow ResMiCo-SM to simulate dataset?

You just need a feature table in the same format as what is generated by ResMiCo-SM. See https://github.com/leylabmpi/resmico/tree/master/ResMiCo-SM#features-table

OK! Many thanks for your help, :)