Update SSE2NEON header
jserv opened this issue · 2 comments
jserv commented
At present, kram
included the old copy of SSE2NEON header, which can be replaced with the latest one: https://github.com/DLTcollab/sse2neon
The latest SSE2NEON already makes use of Aarch64 specific instructions.
alecazam commented
Ah great! I haven't tested the Linux/Win Neon path yet, and am already using Apple's SIMD on iOS/Mac. I'll make an update, so thanks for the tip!
alecazam commented
I had to comment out a few GCC push/pop pragmas, and there was a (-c) construct that I made (-(int32_t)c) to avoid a precision loss warning. But the latest is pushed. I also added fp16 <-> fp32 AVX ops in float4a to/fromFloat16, and didn't see those in sse2neon. I'm using _Float16 on mac/ios, but MSVS doesn't appear to support these.