IdoSpringer/ERGO-II

Instructions for training a new model

Closed this issue · 3 comments

Hi there, thanks a lot for your contribution to ERGO and ERGO-II, which are really useful.
I want to train a new model from different datasets, however, the instructions of training are missed in the README file.
I would appreciate it if you could add it. Thank you very much.

Hi,
Currently ERGO models support only the McPAS and VDJdb datasets.
Since different datasets have different formats, there is no clear way to adjust the code for handling arbitrary datasets.
You can adapt the Sampler.py file for dealing different datasets,
and adapt the Trainer.py main and main arguments for training (it is very similar to ERGO-I instructions in ERGO-I repo).

@IdoSpringer Hello, you said "Currently ERGO models support only the McPAS and VDJdb datasets." because you only used the data of these two databases for pre-training and did not use the data of other databases. Must the trained TCR or epitope sequence be the same as the pre-trained data set? Or is it something else? How can I use your model to train and test other datasets?

In addition, were the McPAS and VDJdb data filtered and corrected, or were these cases not taken into account? @IdoSpringer