Suggest a GCC/LLVM compatible compile option for RVV auto-vectorization

Question

Suggest a GCC/LLVM compatible compile option for RVV auto-vectorization

zhongjuzhe opened this issue 2 years ago · 6 comments

Is it possible that we can have a GCC/LLVM compatible compile option to specifiy LMUL in auto-vectorization?
For example, -mriscv-vector-lmul or -mrvv-vector-lmul ?

Thanks.

Answer 1 · 2023-03-28T01:59:45.000Z

@kito-cheng Can you help me with that ?

Answer 2 · 2023-03-29T01:16:43.000Z

LLVM has an option to specify that -mllvm -riscv-v-register-bit-width-lmul=N, but I guess we need to figure out and define the semantic for the option, especially for the loop contains more than one element width, e.g. one loop contain with i64 and i32, so which LMUL should be used for i64 and i32 IF we specify the -mrvv-vector-lmul=m2? i32 use m2, i64 use m4 or i32 use m1, i64 use m2?

We'll need to clarify those case if we want to define a common option.

Answer 3 · 2023-03-29T01:23:33.000Z

Tag some LLVM developer here to get some feedback about this idea: @topperc @asb @rofirrim @preames

@zhongjuzhe is the major contributor of the vector support for RISC-V GCC, including intrinsic and vectorizer, he already finish first version of auto vectorization on their downstream GCC, and we are planing to support vectorizer on GCC trunk in the second half of 2023,

Answer 4 · 2023-03-29T12:14:36.000Z

Prefer i64 use m2 && i32 use m1 when -mrvv-vector-lmul=m2

Answer 5 · 2023-03-29T19:56:56.000Z

I, strongly, think we're too early in having this discussion. If you have specific examples where you think the default code generation is sub-optimal, please file bugs. Only once we've implemented reasonable code quality in both compilers does it start making sense to provide compatible forcing options.

Also, any option prefixed with -mllvm is explicitly internal, and is not stable. Use at your own risk, you will get no support if that breaks.

Answer 6 · 2023-12-01T21:40:29.000Z

I'm guessing we can restart this conversation. cfr: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112651

TL/DR: in gcc we currently use --param to convey lmul, but that is more of a hint to the optimizer and not really a mandate for it. For latter the preferred way is to use a -mXX toggle.