NO dropout in MLP and CausalSelfAttention

Question

NO dropout in MLP and CausalSelfAttention

peter-ni-noob opened this issue 7 months ago · 2 comments

Answer 1 · 2024-06-23T05:14:50.000Z

Yes, he elaborated on this topic in his video. Overfitting is not a major concern for this project overall because the dataset is virtually infinite. Even after 4 epochs and 40 billion tokens, the validation loss is decreasing steadily.