Q: How to finetune?
SpaceCowboy850 opened this issue · 2 comments
I have trained TinyStories to validation loss that Karpathy achieves.
I want to fine tune on top of this now.
I imagine it is like "resume", but don't reset the iter_num or best_val_loss.
I'm a little uncertain about the cosine scheduler, as I'd be starting with something that produced decently good results...or maybe, yes, keep it, but lower my learning rate to not be as high as the initial pass? Keep or ditch the warmup phase?
Guidance on this would be appreciated!
Reading on this, it seems we have to keep the warmup phase as that is allowing AdamW to settle in a bit for our new finetune.
I attempted to implement layer thawing, but on a very small run, the loss was worse than just letting the whole thing train on top of existing weights.
Ugh - as usual the information is out there, I've either just missed it or forgotten about it.
NanoGPT in this case has the guidance I was looking for:
https://github.com/karpathy/nanoGPT
"Finetuning is no different than training, we just make sure to initialize from a pretrained model and train with a smaller learning rate."
So, there it is. Okay...off to do some more training.