My playground for language modeling.
I'll by using the TinyStories dataset from TinyStories: How Small Can Language Models Be and Still Speak Coherent English?.
- GPT
- Mamba
- Based
- ...
-
Inspired by Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs (Frankle et al., 2020)
-
./runs/tury2cp5/TinyLM-006000.pt