Theano/libgpuarray

fp16 appears to break gpu collectives

Closed this issue · 4 comments

Hi,

Is this known / being worked on? When I try to call collectives under theano.config.floatX="float16", I get the error:

File "pygpu/collectives.pyx", line 257, in pygpu.collectives.GpuComm.broadcast (pygpu/collectives.c:5022)
  File "pygpu/collectives.pyx", line 362, in pygpu.collectives.comm_broadcast (pygpu/collectives.c:6060)
pygpu.gpuarray.GpuArrayException: b'Invalid value or operation'

NCCL can definitely do half-precision.

Really hoping for that free 2x speedup ;)

nouiz commented

@tsirif any idea? A quick look in the code see to indicate it is supported.

Note, you won't get a 2x speed up from the Theano side. Theano only support float16 storage, not float16 computation. But it will speed up the memory transfer and memory access.

It is trivial to include the correspondence between the types in gpuarray_collectives_cuda_nccl.c in this and I believe it would not need any other modifications from C's side. In pygpu however I am not sure how can I establish such a correspondence between a Numpy's type and float16 (maybe we should handle a custom type for float16 all the way...). Also, among the types defined in gpuarray/types.h, which one corresponds float16? Is it GA_FLOAT16? Is it in use/could it be used?

nouiz commented

It is the GA_HALF and ga_float16 c type that correspond to float16.

nouiz commented

The other PR was merged, so to my knowledge all should work. So closing. If you find other problem, tell us.