fp16 appears to break gpu collectives
Closed this issue · 4 comments
Hi,
Is this known / being worked on? When I try to call collectives under theano.config.floatX="float16", I get the error:
File "pygpu/collectives.pyx", line 257, in pygpu.collectives.GpuComm.broadcast (pygpu/collectives.c:5022)
File "pygpu/collectives.pyx", line 362, in pygpu.collectives.comm_broadcast (pygpu/collectives.c:6060)
pygpu.gpuarray.GpuArrayException: b'Invalid value or operation'
NCCL can definitely do half-precision.
Really hoping for that free 2x speedup ;)
@tsirif any idea? A quick look in the code see to indicate it is supported.
Note, you won't get a 2x speed up from the Theano side. Theano only support float16 storage, not float16 computation. But it will speed up the memory transfer and memory access.
It is trivial to include the correspondence between the types in gpuarray_collectives_cuda_nccl.c
in this and I believe it would not need any other modifications from C's side. In pygpu however I am not sure how can I establish such a correspondence between a Numpy's type and float16 (maybe we should handle a custom type for float16 all the way...). Also, among the types defined in gpuarray/types.h
, which one corresponds float16? Is it GA_FLOAT16? Is it in use/could it be used?
It is the GA_HALF and ga_float16 c type that correspond to float16.
The other PR was merged, so to my knowledge all should work. So closing. If you find other problem, tell us.