icecube/photospline

Non-portable instructions

Opened this issue · 6 comments

-msse2 -msse3 -msse4 -msse4.1 -msse4.2 -mavx -march=native causes the compiler to emit instructions that are not supported on older CPUs. Omitting these entirely and relying only on x86_64's required SSE support slows down 4D muktiple gradient evaluations by a factor 2. Figure out which of these are critical, and decide how much we need to support older CPUs.

It may be possible to use function multi-versioning automatically dispatch to architecture-specific versions of functions that may incorporate AVX instructions. Apparently Clang and ICC support a similar mechanism. VC++ has nothing of the sort, but Windows support is very low on my personal priority list.

The bug label is no longer appropriate after 1742b3a.

AVX does turn out to have a use (#11), but only in templated functions that are going to be emitted into dependent libraries anyway. One "solution" is therefore to punt, and require that users know they should dispatch to either an AVX-enabled ndssplineeval_gradient<double>() or a evaluate each element of the gradient individually. This doesn't seem ideal, though.

Also, GCC function multi-versioning seems to have gotten even better in version 6, but that will not be deployed on many platforms.

So it seems like, on x86, we just want two versions: an AVX gradient evaluator and a non-AVX gradient evaluator. How often, in the wild, do you still encounter non-AVX x86 CPUs, though? It first shipped seven years ago.

nega0 commented

Got bit by this today on a K10 which lacks SSE3, SSE4, FMA, and AVX.

Not sure what the best option is in 2022, function multi-versioning or configure-time detection and multiple compilation units.

I do propose we switch to -march=native and let the compiler do it's thing instead of listing out the instruction sets. This is what I'm going to be doing locally. For packaging we'd have to come up w/ a sane default such as -march=nehalem or -march=core2.

I want to do function multi-versioning in the long-run (as I think our reliance on gcc <6 is becoming fairly small, and clang has had okay support for a few versions now), but it will require some non-trivial refactoring, I think, and I haven't found the time to do it in earnest.