Universal Intrinsic for RISC-V Vector

The goal of this project is to optimize the current Universal Intrinsic for RVV, mainly including the following two points.

Reduce the overhead of using wrapper class

There are unnecessary load & store instructions because of using wrapper class, which generates redundant instructions about data interaction between memory and registers.

Support for multiple VLENs

Each vector register in RVV has a fixed VLEN bit of state, but different RVV hardware devices can have different VLEN. Multiple(or scalable) VLENs is the one of the most different between RVV and existing SIMD Arch. But the current RVV UI implements only support VLEN=128.

hanliutong/rvv-ui

Universal Intrinsic for RISC-V Vector

Reduce the overhead of using wrapper class

Support for multiple VLENs

Here is a saxpy example on godbolt