Geeks

codeboy5/transformers_grokking

Primary LanguageJupyter Notebook

  • transformers_grokking GitHub
  • MarkdownReadme
  • 0Issues
  • 0Stargazers
  • 1Watcher

Pytorch (RE)-Implementation of Grokking Phenomenon

This is a pytorch re-implementation of Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets.

I thought this would be a good paper to reproduce since this would allow me to code and train a GPT style model from scratch.

References used for the Code :-

  1. MinGPT by Karpathy

Accuracy Loss Curves for Adam (with any weight decay)

Image 1 Image 2

Accuracy Loss Curves for AdamW ( λ = 1 )

Image 1 Image 2

Share to

Contact site admin: Geeks.