nshepperd/gpt-2

Training on TPU

Dhanachandra opened this issue · 1 comments

How to train the GPT2-xl on TPU? And which TPU can be used to train? And what would be RAM size?

I'm not 100% sure, because I decided to ditch my TPU efforts before I got training working (TPUs ended up being way to expensive and during my dev work I was on a way too small VM so training was failing to do OOM errors on the VM), but I think if you put the following code before the tf.Session() is created in train.py it will connect to a TPU:

tpu_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu="<TPU NODE NAME HERE>")
tf.config.experimental_connect_to_cluster(tpu_resolver)
tf.tpu.experimental.initialize_tpu_system(tpu_resolver)