kimwalisch/primecount

Broken WITH_POPCNT logic

mkoeppe opened this issue · 17 comments

From primecount's documentation:

Packaging primecount

When packaging primecount for e.g. a Linux distro it is best to change a few of the default options.

For performance reasons primecount uses the POPCNT instruction on all CPU architectures that support it. On x86/x64 the POPCNT instruction was added to Intel's and AMD's CPUs alongside the SSE4 instruction set in 2008. If you need to support older x86/x64 CPUs you can disable POPCNT but this will deteriorate performance by about 50%. Note that disabling POPCNT only has an effect on x86/x64, on other CPU architectures POPCNT is always used if it is available (as this generally does not cause any issues).

  • cmake . -DWITH_POPCNT=OFF

The problem is that you're testing whether the compiler supports -mpopcnt, not whether the CPU supports it.

I'd suggest to just test whether the compiler supports -march=native, which will be safe and will include popcnt if the CPU supports it.

My post was incomplete, I didn't meant to post that...

I am aware of this issue and I am thinking about how to solve it. Unfortunately it is more complicated than it seems. I agree that the solution requires use of CMake's try_run which cado-nfs also uses. However in my experience try_run breaks cross compilation, I used to use try_run in primecount but removed it because it caused too many issues. So I will have to research how to prevent try_run from being used when cross-compliation is being used.

The problem is that you're testing whether the compiler supports -mpopcnt, not whether the CPU supports it.

If I check that the user's CPU supports the POPCNT instruction in primecount's CMakeLists.txt build script then the built primecount package/library will always work on the PC that primecount has been compiled on. However, if the user's CPU supports the POPCNT instruction, then the built primecount package/library still cannot be distributed to other PCs that do not support the POPCNT instruction (because primecount has no runtime POPCNT detection).

Is this an issue for sagemath?

we have a "FAT" mode that targets the most basic CPU, and there indeed POPCNT should be OFF. But otherwise one doesn't want to set it to OFF, if possible, to have good performance.


For the cross-compiling issue, one can use CMAKE_CROSSCOMPILING - https://cmake.org/cmake/help/latest/variable/CMAKE_CROSSCOMPILING.html

Here is an example: https://github.com/getdnsapi/getdns/pull/472/files

we have a "FAT" mode that targets the most basic CPU, and there indeed POPCNT should be OFF. But otherwise one doesn't want to set it to OFF, if possible, to have good performance.

Is my understanding correct that you only need build time POPCNT detection (check if the host CPU supports the POPCNT instruction) for sagemath? And not runtime POPCNT detection using e.g. the CPUID instruction in primecount's C++ code (runtime POPCNT detection would add a lot of complexity, I really want to avoid this).

no, I don't think we need runtime detection of POPCNT, but we need proper build time detection (with cmake's try_run, or
probably more convenient https://cmake.org/cmake/help/latest/module/CheckCSourceRuns.html)

OK, thanks for the clarification.

I have now implemented build time POPCNT detection for x86/x64 CPUs. When running primecount's CMake configuration there is now an additional cpu_supports_popcnt check that is run for x86/x64 CPUs:

$ cd primecount
$ cmake . 
...
-- Performing Test mpopcnt
-- Performing Test mpopcnt - Success
-- Performing Test cpu_supports_popcnt
-- Performing Test cpu_supports_popcnt - Success
...

I don't have access to an x86 CPU without POPCNT. Would you mind testing whether the new code works on such a system? (e.g. the system where you previously encountered this issue). The CMake output in this case should look like -- Performing Test cpu_supports_popcnt - Failed and the compiled primecount version should work without any issues.

I don't have access to an x86 CPU without POPCNT. Would you mind testing whether the new code works on such a system? (e.g. the system where you previously encountered this issue). The CMake output in this case should look like -- Performing Test cpu_supports_popcnt - Failed and the compiled primecount version should work without any issues.

I've just tried it, and unfortunately the new test passes even when popcnt is not available. I see,

-- Performing Test cpu_supports_popcnt - Success

and the resulting executable still crashes on the illegal instruction. I've used gcc-11 for the build. It's an older thinkpad laptop sitting right beside me so I'm happy to test or investigate anything you like.

I've just tried it, and unfortunately the new test passes even when popcnt is not available.

I just tested my POPCNT detection code on godblot.org with -O3 and unexpectedly the POPCNT test got optimized away. This could explain your issue. Did you specify any such CXXFLAGS options like -O2 or -O3?

Ah, yes, I had CXXFLAGS="-O2 -pipe -march=native" set. The test does work with CFLAGS/CXXFLAGS unset.

Thanks, I know how to fix this issue...

It should be fixed now (commit: 6efe678)

Can you please retest with optimization flags enabled (CXXFLAGS="-O2 -pipe -march=native") and the latest primecount source code?

It should be fixed now (commit: 6efe678)

Can you please retest with optimization flags enabled (CXXFLAGS="-O2 -pipe -march=native") and the latest primecount source code?

I tried with a variety of optimization settings, and the correct result (i.e. Failure) is now returned in all cases. Thank you!

Thanks for the testing!