Lab41/attalos

TensorFlow allocation error in new Docker containers

Closed this issue · 8 comments

Tensorflow tensor allocation error:

Running a basic iPython notebook tutorial from the website results in the following:

InvalidArgumentError: Cannot assign a device to node 'Variable_2/read': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/GPU:0'
agude commented

@karllab41 Are you using make notebook or make attalos-bash?

agude commented

For the Tensorflow error, can you point me to a notebook that causes it? Various one I've tried do not reproduce the error.

agude commented

This notebook reproduces the Tensorflow error.

agude commented

This bug is identical to one reported on tensorflow itself: tensorflow/tensorflow#514

Possibly we have an old version of something without the fix.

Edit: tensorflow is 0.8.0, which should have the fix in.

@agude / @karllab41 It looks like, at least at the end of last year, the advice was to force embedding_lookups onto the cpu a la tensorflow/tensorflow@3f0a031. There isn't really a discussion I can find of if this is a regression or just successful runs (like in the notebook you linked to) are CPU only and so they didn't run into the GPU error. It appears the operation (embedded_lookup) simply isn't supported on the GPUs. It appears the example word2vec implementation still pins the computation to the CPU (in tensorflow/master). Can we live with that for our use case?

Actually, this was something I was thinking of getting to later rather than sooner. Our vocabulary is going to be sizable and will best not be resident on GPU anyway. Since this impacts only the final layer of any im2vec/grph2vec/word2vec architecture, putting those computations on CPU, while the dense matrix multiplications on GPU will make sense.

Unfortunately, thought would have to be put into how to implement in TF (or any framework), i.e., how to pass gradients back. Asking Patrick, he said there were options to assign nodes to different processing units, which might be a way to go.

For now, I’m having Alex implement negative sampling (for educational and practical purposes) in TF, so (1) we can have a semblance of a working version and (2) we have an understanding of the math. This will replace the GPU-offending “sampled_softmax” graph node, and enable to algorithm to work entirely on the GPU.

agude commented

You can pin nodes to different architectures, for example: http://stackoverflow.com/a/33624832/1342354

def device_for_node(n):
  if n.type == "MatMul":
    return "/gpu:0"
  else:
    return "/cpu:0"

with graph.as_default():
  with graph.device(device_for_node):
    ...

I was trying at one point to get Tensorflow's official GPU Docker container running, but it didn't out of the box so I moved on after about five minutes.

That is exactly what we want!