qcxms/QCxMS

setting of compiler options so everything is recompiled with new settings

Opened this issue · 4 comments

Hi,
i tried to use some more aggressive compiler options ("-fast -parallel -xCORE-AVX2"), especially because Intel VTune is free now and
some of the settings showed large speedups, especially with AVX options and IPO

(https://softline.ru/uploads/98/7d/b5/c9/8c/f9/29/88/2b/origin.pdf)
and
(https://www.fz-juelich.de/SharedDocs/Downloads/IAS/JSC/EN/slides/supercomputer-ressources-2020-11/16-tuning_intel.pdf)

But I wonder where should I put them? Into the /config/meson.build file? I added it here, but it did not seem to use the compiler options for all the code, at least I could not see it in the log files. Or should I call them with a meson option?

elif fc_id == 'intel'
  add_project_arguments(
    '-traceback',
    '-Ofast',
    '-axCORE-AVX2',
    language: 'fortran',
  )
  add_project_arguments(
    '-DLINUX',
    language: 'c',
  )

Another related question is if the FORTRAN compiler options are completely different from the C compiler options, also WINDOWS and LINUX seem to have different calling conventions unfortunately. For example LINUX requires: " -march=core-avx2" .

See
https://software.intel.com/content/www/us/en/develop/articles/performance-tools-for-software-developers-intel-compiler-options-for-sse-generation-and-processor-specific-optimizations.html

Thanks.
Tobias

You can always tweak your build by providing -Dfortran_args and -Dfortran_link_args in the configuration step, this will apply the selected options globally. Using the add_project_arguments and add_project_link_arguments in the build file will apply those options only for the current project (QCxMS), but not for the complete stack (tblite, dftd4, ...). In any case meson will recompile all source files with changed compile arguments and relink all executables with changed link arguments.

Finally, there is no definite answer to compiler optimization, it very much depends on the host system(s) you are planning to use and how portable the resulting binary is supposed to be. In any case, it is important to study the manual of the compiler very carefully for usable options and their limitations.

As a tip here, stay a way from -x options and -march options, unless you have your compiler make the correct choice for you with -xHOST (for Intel micro-archs) or -march=native (for non-Intel micro-archs). It is preferable to cross-compile for micro-architectures instead using -ax (Intel) and -mtune (non-Intel) additionally to the default SSE2 fallback codepath (this will yield larger binaries).

Thanks. I tried different settings, but I did not see that the options were actually used, the meson/ninja logs only had -static and -O2 which I wanted to override. the compiler produces different sized binaries, but my goal was to recompile everything including submodules with my settings (for AVX512 or AVX2), its just for a specific platform. So far I was not able to do that. According to the Phoronix compiler benchmarks, some code could have between 20-30% speed up, which for calculations that last many hours could be a nice tweek. https://www.phoronix.com/scan.php?page=article&item=gcc9-skylake-avx512&num=2



meson setup build -Dfortran_link_args=-static  -Dfortran_args="-fast -parallel -axCORE-AVX512"	 --reconfigure

meson setup build -Dfortran_link_args=-static 	-Dfortran_args="-fast -parallel -xAVX" 


Maybe I am using the wrong settings.

Depending on the options chosen you can gain up to a factor of two in run time, with a penalty in build time of a factor 3 to 5, which makes this non-routine IMO. Always keep in mind, that using aggressive optimization options requires an extensive testsuite to verify the correctness of the produced artifact.

Selecting the options is possible as described above and your configure command looks correct to me. Generally, avoid reconfiguring builds or you will run into issues with stale build artifacts, which might result in the wrong conclusion for you optimization experiments.

Anyway, compiler optimization is a tricky topic. I usually recommend to use optimized binaries provided from the project maintainers. For xtb we have those already, for qcxms we might establish them in the future.

I have a dedicated build machine, a full compile takes like 1:30 min, so as long as its under 1h I actually don't mind longer compile times. But 1.5 min is pretty fast IMO. For the tests, I agree I think some of the QM tools have 400 test units, testing all kinds of settings for correctness. That's good software practice, if that takes an hour, fine too.

There are also some open meson issues (mesonbuild/meson#6180) and (mesonbuild/meson#7325), not sure if they are relevant, I always have to completely remove everything, including the subs, otherwise it will not build.

I agree xtb is fast, but with qcxms we are also dealing with multiple MD steps in each trajectory x1000. So average molecules of interest that are 500-600 Dalton large take me 1-2 days to compute. My problem is of course we have 100k-200k small molecules of interest, so with a day computational time, that would be multiple centuries (200k days is ~547 years). So unless we can get 1000-fold compute capacity, which is not easy, we will not be able to churn through those molecules. Plus of course awaiting are also the rest of the 100 million molecules from PubChem. So if a simple compiler setting can give me some more speed I will be very happy with that. Again, I really appreciate all the help.