parallella/pal

Fast/Approximate math function section

Opened this issue · 2 comments

syoyo commented

It would be better to have faster/approximate math functions section in PAL, which is specially designed for running math kernel on Epiphany with selectable precision.

For example,

  • Full precision/Low performance(reference) : e.g., expf() from newlib.
  • Middle precision/Middle performance(20 ~ 100 clocks. general use) : e.g, e_approx_exp()
  • Low precision/Faster performance(~20 clocks. DSP use): e.g, e_fast_exp()
  • Short vectorized version for Middle and Low precision.

I think this design is much applicable for actual application programmers.

And here's validated approximate exp() implementation for Epiphany.

https://github.com/syoyo/parallella-playground/blob/master/math_exp/e_fast_exp.c

25 ~ 74 clocks for each exp(x) evaluation within 1.0e-5 relative error.

I appreciate what you are saying here. My concern would be that the library would become too big too quickly.
-Difference in precision between low,med.high precision examples?
-Difference in code size for your exp for low,med, high?
-The newlib implementation is far too large to be practical for most embedded devices. Can it be reduced to something practical in terms of code size while keeping the precision?

syoyo commented

-The newlib implementation is far too large to be practical for most embedded devices. Can it be reduced to something practical in terms of code size while keeping the precision?

There's a faster(non SW-based), formally verified, full precision, correctly rounded math library R&D, for example:

But I'm not sure this could be implemented on Epiphany since Epiphany lacks some IEEE 754 features(although FMA helps a lot). And also, it may require a lot of engineering&research resources to port such a math function to Epiphany.