High compute time, what is a reasonable generator model to get somewhat good results to play with?
jplasser opened this issue · 1 comments
Great resource and course! Thanks for the great work!
I converted parts of this repository to Google Colab and just ran the generate.py which would take something like 1-2 days to compute (~32-48 hrs). What is a reasonable num_batches value to train the generator model?
Regards,
Juergen
The default is a little misleading there, since I just set it to a high value so it keeps running until the output looks good. I think after 50_000 batches (3-4 hours on an old GPU), you should definitely be able to tell that the model is learning something (~1.7 bits per byte on the validation data). After about 24 hours, you should get down to about 1.5 bits per byte (I think the example from the blog post was generated after 24h of training).
All of this is a bit long to wait for on Colab, so I recommend starting with something like 4 transformer blocks and a context of 64. The results won't be as impressive, but it'll converge much quicker (reducing the context size is especially helpful).