Fast multiplication of single-precision and half-precision matrices on Tensor Cores
Primary LanguageCuda