Allow users to use Colaboratory's TPU for finetuning
minimaxir opened this issue · 7 comments
This alone will be the single-biggest improvement for gpt-2-simple.
- 8 cores
- ~2x speed increase relative to a K80
= 16x training speed
Unfortunately documentation for using Colaboratory's TPU is a bit messy.
According to what I see here, the 8x speed up is only if you have a batch_size of 8 or more, as the batches are distributed among the cores. However, if you're already using a batch_size of 2, the training speed should be about 4x if you change batch_size to 8, which is still a very nice speed-up.
All the documentation I can find on using the TPUs seems to be using tensorflow's keras api, like this example, so the model might have to be converted to that.
I do not use batch_size=2
and I believe it is a trap, as the GPU is almost fully utilized even at batch_size=1
. In testing, they have the same throughput, except batch_size=1
doesn't max out the GPU memory.
Workflows that use the TPU do use batch_size=8
(or implement it by batch_size *= 8
), and it's theoretically possible with the correct TensorFlow distribution strategy.
Hi all! I've been playing with the idea of making this run in a Colaboratory TPU. So far, no luck, but I seem to be really close.
I have a mess in the code right now -- my approach was first to make it work and then simplify and clean up.
I'm currently stuck at the point of loading the initialized model so it can be finetuned. It will complain that the local file system scheme is not implemented. I understand that the TPU is instructed (through tf.Saver
) to pick up the model from a local source even though we specify a Google Cloud address. It fails because apparently, TPUs work with GCS addresses for storage.
This is where I'm currently at: https://colab.research.google.com/drive/1_WVxlRgUjfAVZ5im2LaBoQcA5XnpU0K6
A few notes on that code:
- I reload the code from git instead of pip so I can keep sending modifications into the code and testing them. I thought this would be the easiest way to test but it's way too black-boxy to see what's going on.
- I suspect Google Drive may not be needed anymore, we can probably deal with GCS and the local colaboratory storage only. I started with the approach of making this an option that would branch out in the same code, but it would quickly become messy, so I think the best approach might be to have a whole new finetune notebook and method that is specific to TPU processing.
- Everything is pretty much the same as the original code. :)
I might drop out of this effort for a couple of days (weeks?) unless someone has a quick approach I might take. Regardless, if someone benefits from my advances, it has been worth it!
For reference, this is the error:
Full error and stack trace
InvalidArgumentError Traceback (most recent call last)
InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on models/117M/model.ckpt: Unimplemented: File system scheme '[local]' not implemented (file: 'models/117M/model.ckpt')
[[node save/RestoreV2 (defined at <ipython-input-8-9187be9325b3>:101) ]]
Caused by op 'save/RestoreV2', defined at:
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelapp.py", line 477, in start
ioloop.IOLoop.instance().start()
File "/usr/local/lib/python3.6/dist-packages/tornado/ioloop.py", line 888, in start
handler_func(fd_obj, events)
File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
self._handle_recv()
File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
self._run_callback(callback, msg)
File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
callback(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
handler(stream, idents, msg)
File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "/usr/local/lib/python3.6/dist-packages/ipykernel/ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/lib/python3.6/dist-packages/ipykernel/zmqshell.py", line 533, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2718, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2828, in run_ast_nodes
if self.run_code(code, result):
File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-8-9187be9325b3>", line 201, in <module>
save_every=100 # how many steps between saving checkpoint
File "<ipython-input-8-9187be9325b3>", line 101, in finetune_tpu
save_relative_paths=True)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 832, in __init__
self.build()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 844, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 881, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 513, in _build_internal
restore_sequentially, reshape)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 332, in _AddRestoreOps
restore_sequentially)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 580, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1572, in restore_v2
name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to get matching files on models/117M/model.ckpt: Unimplemented: File system scheme '[local]' not implemented (file: 'models/117M/model.ckpt')
[[node save/RestoreV2 (defined at <ipython-input-8-9187be9325b3>:101) ]]
@AlphaGit
Also I think the actually gpt-2 model weights are actually stored in GCS so shouldn't we be able to read them directly?
Furthermore, you do not need to do that.
You don't need to actually save the weights in Google Cloud storage. Keras does not do that, but you would need to copy the weights over to the CPU and then save them using tf.saver:
https://github.com/tensorflow/tensorflow/blob/234025c31013f5aa38b63fee5cfcd6e8d5c21e17/tensorflow/contrib/tpu/python/tpu/keras_support.py#L2098
@Skylion007 Hi! Thanks for the response.
My problem doesn't seem to be really saving the weights, but rather loading them in the first place. (At least... not so far.)
Yours is an interesting approach. I initially moved out of it because loading the model in memory to later on transfer to the TPU meant moving around 500MBs of data (word embeddings and all). But it might not be that bad, the colaboratory should be prepared to deal with bigger datasets all the time, right?
Regarding, tf.saver, I believe it is really tied to a filesystem. At least, that's what prevented me from using it against GCS... but I might have done it wrong. This is what I'm stuck on right now.
Loading them is pretty straight forward. I almost have a working solution using https://github.com/CyberZHG/keras-gpt-2 but I still need to debug some keras vs tf.keras issues. I did get it working with a fixed input shape so it is possible to at least load it on the TPU, but I need to fix the input and output layers.
@AlphaGit
Okay, it was way harder than it needed to be, but I got it running on a TPU. It needs a lot of optimization, and runs out of memory but otherwise, it should work. Right now all it can do it is load the model weights on the TPU and run inference on them: https://colab.research.google.com/drive/17I7VZrcxM-BfadRqAFWb2DzG3RoAtG5n
It's able to load and save model weights to the local disk.