fp16 appears to break gpu collectives

Question

fp16 appears to break gpu collectives

Closed this issue 7 years ago · 4 comments

Hi,

Is this known / being worked on? When I try to call collectives under theano.config.floatX="float16", I get the error:

File "pygpu/collectives.pyx", line 257, in pygpu.collectives.GpuComm.broadcast (pygpu/collectives.c:5022)
  File "pygpu/collectives.pyx", line 362, in pygpu.collectives.comm_broadcast (pygpu/collectives.c:6060)
pygpu.gpuarray.GpuArrayException: b'Invalid value or operation'

NCCL can definitely do half-precision.

Really hoping for that free 2x speedup ;)

Answer 1 · 2017-04-19T14:58:19.000Z

@tsirif any idea? A quick look in the code see to indicate it is supported.

Note, you won't get a 2x speed up from the Theano side. Theano only support float16 storage, not float16 computation. But it will speed up the memory transfer and memory access.

Answer 2 · 2017-04-20T12:51:11.000Z

It is trivial to include the correspondence between the types in gpuarray_collectives_cuda_nccl.c in this and I believe it would not need any other modifications from C's side. In pygpu however I am not sure how can I establish such a correspondence between a Numpy's type and float16 (maybe we should handle a custom type for float16 all the way...). Also, among the types defined in gpuarray/types.h, which one corresponds float16? Is it GA_FLOAT16? Is it in use/could it be used?

Answer 3 · 2017-04-20T14:21:03.000Z

It is the GA_HALF and ga_float16 c type that correspond to float16.

Answer 4 · 2017-05-23T18:18:45.000Z

The other PR was merged, so to my knowledge all should work. So closing. If you find other problem, tell us.