Optimize _mm_round_pd within 2^52 only
howjmay opened this issue · 2 comments
For rounding double numbers (to the nearest mode), there is a magic number 2^52 that can round the number within this range. Should we reimplement the current C fashion implementation in armv7 to the version with magic number? It somehow discard quite a lot of valid range.
We can refer the implementation here.
https://github.com/numpy/numpy/blob/main/numpy/core/src/common/simd/neon/math.h#L285-L299
For rounding double numbers (to the nearest mode), there is a magic number 2^52 that can round the number within this range. Should we reimplement the current C fashion implementation in armv7 to the version with magic number?
Can you show some evaluation on error rate? Any progress?
I may close this proposal, since it is a limited solution that works only under truncation, and losing 12 bits information is quite a lot