The goal of this project is to optimize the current Universal Intrinsic for RVV, mainly including the following two points.
There are unnecessary load & store instructions because of using wrapper class, which generates redundant instructions about data interaction between memory and registers.
Each vector register in RVV has a fixed VLEN bit of state, but different RVV hardware devices can have different VLEN. Multiple(or scalable) VLENs is the one of the most different between RVV and existing SIMD Arch. But the current RVV UI implements only support VLEN=128.