Broken WITH_POPCNT logic
mkoeppe opened this issue · 17 comments
I guess one can use https://gitlab.inria.fr/cado-nfs/cado-nfs/-/blob/master/config/popcnt.cmake and the test program https://gitlab.inria.fr/cado-nfs/cado-nfs/-/blob/master/config/popcnt.c
From primecount's documentation:
Packaging primecount
When packaging primecount for e.g. a Linux distro it is best to change a few of the default options.
For performance reasons primecount uses the POPCNT
instruction on all CPU architectures that support it. On x86/x64 the POPCNT
instruction was added to Intel's and AMD's CPUs alongside the SSE4 instruction set in 2008. If you need to support older x86/x64 CPUs you can disable POPCNT
but this will deteriorate performance by about 50%. Note that disabling POPCNT
only has an effect on x86/x64, on other CPU architectures POPCNT
is always used if it is available (as this generally does not cause any issues).
cmake . -DWITH_POPCNT=OFF
The problem is that you're testing whether the compiler supports -mpopcnt
, not whether the CPU supports it.
I'd suggest to just test whether the compiler supports -march=native
, which will be safe and will include popcnt if the CPU supports it.
My post was incomplete, I didn't meant to post that...
I am aware of this issue and I am thinking about how to solve it. Unfortunately it is more complicated than it seems. I agree that the solution requires use of CMake's try_run
which cado-nfs also uses. However in my experience try_run
breaks cross compilation, I used to use try_run
in primecount but removed it because it caused too many issues. So I will have to research how to prevent try_run
from being used when cross-compliation is being used.
The problem is that you're testing whether the compiler supports -mpopcnt, not whether the CPU supports it.
If I check that the user's CPU supports the POPCNT instruction in primecount's CMakeLists.txt
build script then the built primecount package/library will always work on the PC that primecount has been compiled on. However, if the user's CPU supports the POPCNT instruction, then the built primecount package/library still cannot be distributed to other PCs that do not support the POPCNT instruction (because primecount has no runtime POPCNT detection).
Is this an issue for sagemath?
we have a "FAT" mode that targets the most basic CPU, and there indeed POPCNT
should be OFF
. But otherwise one doesn't want to set it to OFF
, if possible, to have good performance.
For the cross-compiling issue, one can use CMAKE_CROSSCOMPILING
- https://cmake.org/cmake/help/latest/variable/CMAKE_CROSSCOMPILING.html
Here is an example: https://github.com/getdnsapi/getdns/pull/472/files
we have a "FAT" mode that targets the most basic CPU, and there indeed POPCNT should be OFF. But otherwise one doesn't want to set it to OFF, if possible, to have good performance.
Is my understanding correct that you only need build time POPCNT detection (check if the host CPU supports the POPCNT instruction) for sagemath? And not runtime POPCNT detection using e.g. the CPUID instruction in primecount's C++ code (runtime POPCNT detection would add a lot of complexity, I really want to avoid this).
no, I don't think we need runtime detection of POPCNT, but we need proper build time detection (with cmake's try_run
, or
probably more convenient https://cmake.org/cmake/help/latest/module/CheckCSourceRuns.html)
OK, thanks for the clarification.
I have now implemented build time POPCNT detection for x86/x64 CPUs. When running primecount's CMake configuration there is now an additional cpu_supports_popcnt
check that is run for x86/x64 CPUs:
$ cd primecount
$ cmake .
...
-- Performing Test mpopcnt
-- Performing Test mpopcnt - Success
-- Performing Test cpu_supports_popcnt
-- Performing Test cpu_supports_popcnt - Success
...
I don't have access to an x86 CPU without POPCNT. Would you mind testing whether the new code works on such a system? (e.g. the system where you previously encountered this issue). The CMake output in this case should look like -- Performing Test cpu_supports_popcnt - Failed
and the compiled primecount version should work without any issues.
I don't have access to an x86 CPU without POPCNT. Would you mind testing whether the new code works on such a system? (e.g. the system where you previously encountered this issue). The CMake output in this case should look like
-- Performing Test cpu_supports_popcnt - Failed
and the compiled primecount version should work without any issues.
I've just tried it, and unfortunately the new test passes even when popcnt is not available. I see,
-- Performing Test cpu_supports_popcnt - Success
and the resulting executable still crashes on the illegal instruction. I've used gcc-11 for the build. It's an older thinkpad laptop sitting right beside me so I'm happy to test or investigate anything you like.
I've just tried it, and unfortunately the new test passes even when popcnt is not available.
I just tested my POPCNT detection code on godblot.org with -O3
and unexpectedly the POPCNT test got optimized away. This could explain your issue. Did you specify any such CXXFLAGS
options like -O2
or -O3
?
Ah, yes, I had CXXFLAGS="-O2 -pipe -march=native"
set. The test does work with CFLAGS
/CXXFLAGS
unset.
Thanks, I know how to fix this issue...
It should be fixed now (commit: 6efe678)
Can you please retest with optimization flags enabled (CXXFLAGS="-O2 -pipe -march=native"
) and the latest primecount source code?
It should be fixed now (commit: 6efe678)
Can you please retest with optimization flags enabled (
CXXFLAGS="-O2 -pipe -march=native"
) and the latest primecount source code?
I tried with a variety of optimization settings, and the correct result (i.e. Failure) is now returned in all cases. Thank you!
Thanks for the testing!