
NO dropout in MLP and CausalSelfAttention

peter-ni-noob opened this issue · 2 comments

NO dropout in MLP and CausalSelfAttention

Yes, he elaborated on this topic in his video. Overfitting is not a major concern for this project overall because the dataset is virtually infinite. Even after 4 epochs and 40 billion tokens, the validation loss is decreasing steadily.