Altivec option is misleading, it should be VSX; also, why only ppc64le, but not ppc64 (for supported CPUs)?

Question

Altivec option is misleading, it should be VSX; also, why only ppc64le, but not ppc64 (for supported CPUs)?

barracuda156 opened this issue 10 months ago · 8 comments

What CMakeLists call Altivec is in fact VSX, which is a later ISA. It is misleading to use Altivec name.

This is on a system where Altivec is supported (but VSX is not):

/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_archivers_blosc2/blosc2/work/c-blosc2-2.13.2/blosc/shuffle-altivec.c:41:9: error: '__builtin_vsx_stxvw4x_v16qi' requires the '-mvsx' option
   41 |         vec_xst(xmm0[i], j + i * total_elements, dest);
      |         ^~~~~~~
/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_archivers_blosc2/blosc2/work/c-blosc2-2.13.2/blosc/shuffle-altivec.c:41:9: note: overloaded builtin '__builtin_vec_vsx_st' is implemented by builtin '__builtin_vsx_stxvw4x_v16qi'
/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_archivers_blosc2/blosc2/work/c-blosc2-2.13.2/blosc/shuffle-altivec.c: In function 'shuffle4_altivec':
/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_archivers_blosc2/blosc2/work/c-blosc2-2.13.2/blosc/shuffle-altivec.c:57:7: error: '__builtin_vsx_lxvw4x_v16qi' requires the '-mvsx' option
   57 |       xmm0[i] = vec_xl(bytesoftype * j + 16 * i, src);
      |       ^~~~
/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_archivers_blosc2/blosc2/work/c-blosc2-2.13.2/blosc/shuffle-altivec.c:57:7: note: overloaded builtin '__builtin_vec_vsx_ld' is implemented by builtin '__builtin_vsx_lxvw4x_v16qi'
/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_archivers_blosc2/blosc2/work/c-blosc2-2.13.2/blosc/shuffle-altivec.c:65:9: error: '__builtin_vsx_stxvw4x_v16qi' requires the '-mvsx' option
   65 |         vec_xst(xmm0[i], j + i*total_elements, dest);
      |         ^~~~~~~

The code explicitly uses VSX built-ins.

Is there some reason why VSX are allowed only for little-endian version (ppc64le)? Big-endian ppc64 supports VSX insns, though starting from ISA 2.06. (So it should not be enabled without a check, of course.)
If it is undesirable to have a check for supported ISA, then it can be done via non-default configure option. So that those who use VSX-capable POWER hardware in big-endian mode could benefit from hardware capabilities.

Answer 1 · 2024-03-18T22:41:26.000Z

By the way, it is possible to support actual Altivec as a fallback? I.e. ISA 2.02.

Answer 2 · 2024-03-19T07:07:49.000Z

IIRC @kif is the author of the VSX code. He might shed some light on this.

Answer 3 · 2024-06-07T16:50:30.000Z

Closing due to inactivity.

Answer 4 · 2024-06-07T20:00:58.000Z

@kif Any update on this?

Answer 5 · 2024-06-08T06:18:05.000Z

Sorry for the delay. I was not aware that VSX had more instruction than Altivec (actually VMX). I thought it was just more registers. So I do agree the test should be on the presence of the VSX instruction and not on the VMX. The name of the files should be changed as well.
One can get inspiration from:
https://bugzilla.mozilla.org/show_bug.cgi?id=1629414

While I have access to a Power9, I have not access to elder BigEndian version of those computers. One should re-open this issue.

Answer 6 · 2024-06-08T06:37:17.000Z

@kif If you or someone could propose AltiVec-compatible fallback, I can test it locally. (Unfortunately, I cannot write this kind of code myself.)

While I have access to a Power9, I have not access to elder BigEndian version of those computers. One should re-open this issue.

All Power cpus are bi-endian in fact, and perhaps you could also virtualize Big-endian system on a Little-endian host without loss of speed.
This won’t help on its own with earlier ISA compatibility, but it should allow to test the code for modern Big-endian systems (OpenBSD and FreeBSD run on Power9, AFAIK).

Answer 7 · 2024-06-08T07:46:21.000Z

I did that a long time ago and debugged it on the architecture I had access to (Power9).
I guess the compilation would have gone through if the instruction would have had been available.
All this part of code is re-shuffling bytes/bits. If those instruction are not available, the code should silently fall back on the pure C implementation.
Since it is not, one should just tidy up the code and check for the presence of this VSX variable.

About the emulation of BE on LE machine, I don't think power9 or ARM (both bi-endian) have any advantage in comparison to pure little-endian processor like x86 ... but maybe I am wrong.

Answer 8 · 2024-06-08T07:54:17.000Z

About the emulation of BE on LE machine, I don't think power9 or ARM (both bi-endian) have any advantage in comparison to pure little-endian processor like x86

Power probably has, though it is not something relevant for me (nothing beyond G5 hardware available here), so I am not too sure.
There is some info from TFF developer: https://www.talospace.com/2018/08/making-your-talos-ii-into-power-mac.html