NO dropout in MLP and CausalSelfAttention
peter-ni-noob opened this issue · 2 comments
peter-ni-noob commented
NO dropout in MLP and CausalSelfAttention
unclecode commented
Yes, he elaborated on this topic in his video. Overfitting is not a major concern for this project overall because the dataset is virtually infinite. Even after 4 epochs and 40 billion tokens, the validation loss is decreasing steadily.