riscv-non-isa/rvv-intrinsic-doc

vcreate intrinsics for LMUL > 1

Closed this issue · 8 comments

We have vcreate/vget/vset for tuple type and vget/vset for LMUL > 1 type.

Why don't we have vcreate for LMUL > 1 ?

vfloat16m8 = __riscv_vcreate_v_f16m8 (vfloat16m1, ...);

Tuples are frequently created because users need to construct them as inputs for segment stores. It is a syntax sugar and we have them because users do raise an issue that this will help them.

On the other hand, are users motivated to construct values of a higher LMUL upon their results? Should they be considering to use a higher LMUL at the first place?

Tuples are frequently created because users need to construct them as inputs for segment stores. It is a syntax sugar and we have them because users do raise an issue that this will help them.

On the other hand, are users motivated to construct values of a higher LMUL upon their results? Should they be considering to use a higher LMUL at the first place?

If you assume users use higher LMUL at the first place. Why do you define vget/vset for LMUL > 1 ?
IMHO, we'd better make intrinsics consistent.

vget/vset are there for functional completeness. vcreate is a syntax sugar though.

So, why don't LMUL > 1 has sugar too ?

I guess I don't have a strong reason against this, but I think it would be good to back this with motivation.

I guess I found a motivation... https://github.com/opencv/opencv/blob/master/modules/core/include/opencv2/core/hal/intrin_rvv_scalable.hpp#L497

Yes. That's why I said we need vcreate for big LMUL.
I just remember long time ago some body ask me whether we can have vcreate like ARM SVE for big LMUL and tuple.

I just noticed you only add tuple vcreate recently.

I don't agree with the hypothesis of this issue.

I agree that proposed vcreate for the (non-tuple) LMUL > 1 case would be syntactic sugar.

However, vcreate is not syntactic sugar for tuple types: rather, it is functionally necessary. To verify this yourself, try to initialize a tuple via a sequence of vset intrinsics, and compile with -Wuninitialized -Werror, using a recent (17.0.2+) LLVM toolchain.

The root cause of this discrepancy is that the current API provides vundefined objects for non-tuple types, but not for tuple types. (I don't know whether or not this omission was intentional.)

Therefore, another approach towards consistency would be to add the "missing" vundefined tuple objects, and remove vcreate.

It's also worth distinguishing between register-group fusion --- e.g., fusing two m2 groups into one m4 group --- from tuple fusion --- e.g., fusing two m2 groups into one m2x2 tuple. The latter is always portable (VLEN-agnostic) whereas the former may not be.