GPT-2 sucks at third grade math, I wonder if we can do better
We will be using this dataset.
Joint work by Helen Ngo, Joseph Palermo and Michael Jia, with support from Rayhane Mama.
All character-level.
- LSTM with teacher forcing
- tiny Transformer
- add Encoder
- regular Transformer
- 1558M GPT-2 finetune, but there's some nuance here
This is mostly a to-read list.