How to train a model by myself

Question

How to train a model by myself

Closed this issue 8 months ago · 6 comments

Trek-on commented 9 months ago

Hello, I have a few questions that I would like to ask you:

Open source code only provides prediction and fine-tuning code, without providing training code, right? If I want to train my own model, it's best to make one by myself?
The reason why we can see all the technical details is because when the model predicts, it predicts by reading the downloaded model parameters. Is this understanding correct?
"Training low resolution random models", I mean the fine-tuning code for the last part of the official Notebook instance. How did he implement it? It seems that he did not download the training data，is it just randomly adjusting the parameters in the parameter file?

Thank you very much!

Answer 1 · 2024-05-16T02:55:52.000Z

Excuse me, have you implemented training a new graphcast model yourself?

Answer 2 · 2024-05-17T11:03:24.000Z

Thanks for your message, the open source code provides a "loss" function, which you can use to both train and fine-tune the model if you can fit it in your hardware. However you would need to provide your own data iterators, and implement batch parallelism (to train on multiple devices simultaneously and this way reduce training time) for your specific platform.

Answer 3 · 2024-05-18T13:21:00.000Z

Thank you very much for your answer, I'm a beginner and it's hard for me to reproduce such a complex model as GraphCast. I want to learn such a good model, but the training details are not mentioned in the paper, which is not enough for me to complete the reproduction independently. So could you please provide an example of training from scratch that I can use as a reference.

Answer 4 · 2024-05-18T14:24:58.000Z

but the training details are not mentioned in the paper
To the best of our knowledge all training details for minimizing the loss (optimizer, batch size, trajectory sampling, learning rate schedules, etc) are provided in the supplementary materials of the paper (sections 4.4 and 4.5).

If there is something you find is missing, please let us know we will more than happy to clarify!

Answer 5 · 2024-07-17T08:19:16.000Z

Hello, I have some questions about model training. Have you tried training models with different resolutions, GraphCast_small (13levels, 1°) and GraphCast (37levels, 0.25°)? How much time and memory does it take to train these two models?

I look forward to your response. Thank you.

Best regards！

Answer 6 · 2024-07-17T09:45:26.000Z

@zhongmengyi I have replied in your separate issue #77