transformer-memorization

Trained Model Checkpoints

To support this effort, we have made several trained model checkpoints publicly avaliable.

Params	Batch Size	Num Layers	Dim Model	Num Heads	Cores per Replica	Replicas per Batch	GAS	TPU Size
162,675,936	512	12	768	16	8	2	8	256
304,663,776	512	32	768	16	8	1	16	256
	512	28						256
	512	28						256
	512	28						256
	512	28						256
	512	28						256
	512	28						256
6,053,381,344	512	28	4,096	16	16	1	16	256

The script evaluation_script.py evaluates the memorization of input tfrecords based on the memorization metric
Scripts have the following arguments:
- --wandb-project-name : wandb project name for the current run