fp16 winograd

Question

fp16 winograd

clarencewxl opened this issue 4 years ago · 1 comments

In the paper, you mentioned that the implementation can be ported to fp16 version.
So, have you succeed in implementing fp16 Winograd with tensor-core and beating the performance of the cudnn.

I found that the cudnn doesn't have fp16 Winograd convolution3x3 but only fp16 gemm convolution3x3. I have no idea why Nvidia doesn't implement one.

Answer 1 · 2020-11-21T05:28:54.000Z

Hi.

I have not implemented fused Tensor Core fp16 Winograd yet.

I believe cuDNN's non-fused Winograd leverages Tensor Core.