KernelTuner/kernel_float

Add `fast` versions for `sqrt`, `rsqrt`, and `recip`

Closed this issue · 1 comments

This is a list of the fast approximate functions that CUDA supports that we should also support:

f32:

  • sqrt
  • rsqrt
  • rcp
  • sin
  • cos
  • div
  • log
  • log2
  • log10
  • exp
  • exp2
  • exp10

f64

  • rcp
  • rsqrt