Consider host-architecture aware compiler optimization
stv0g opened this issue · 4 comments
By passing the correct -march
flag:
- https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
- https://wiki.gentoo.org/wiki/GCC_optimization#-march
Especially for -march=native
:
This selects the CPU to generate code for at compilation time by determining the processor type of the compiling machine. Using -march=native enables all instruction subsets supported by the local machine (hence the result might not run on different machines). Using
-mtune=native
produces code optimized for the local machine under the constraints of the selected instruction set.
The default is usually x86-64
A generic CPU with 64-bit extensions.
Which means that no newer instruction subsets are enabled.
Also interesting are x86-64-v2
, x86-64-v3
, x86-64-v4
:
These choices for cpu-type select the corresponding micro-architecture level from the x86-64 psABI. On ABIs other than the x86-64 psABI they select the same CPU features as the x86-64 psABI documents for the particular micro-architecture level.
I just saw the compiler flags which OPAL-RT is using to compile their model in RT-LAB:
gcc -c -O3 -ffast-math -mtune=native -march=native -falign-loops=2 -falign-jumps=2 -falign-functions=2 -m64
-ffast-math
is interesting. I also learned something new here:
@dinkelbachjan @m-mirz I think these compiler flags might have a quite high impact on DPsim's performance.
Do we have a profiling/benchmark script which I could use to verify this?
Or maybe a student who could run some comparisons using these flags?