simplify blendv functions?
JishinMaster opened this issue · 2 comments
JishinMaster commented
Dear All,
Are we sure we need the extra "vshrq" in the blendv functions?
The current version :
FORCE_INLINE __m128 _mm_blendv_ps(__m128 _a, __m128 _b, __m128 _mask)
{
// Use a signed shift right to create a mask with the sign bit
uint32x4_t mask =
vreinterpretq_u32_s32(vshrq_n_s32(vreinterpretq_s32_m128(_mask), 31));
float32x4_t a = vreinterpretq_f32_m128(_a);
float32x4_t b = vreinterpretq_f32_m128(_b);
return vreinterpretq_m128_f32(vbslq_f32(mask, b, a));
}
The version I used with no problem so far (I may be wrong!) :
FORCE_INLINE __m128 _mm_blendv_ps(__m128 _a, __m128 _b, __m128 _mask)
{
float32x4_t a = vreinterpretq_f32_m128(_a);
float32x4_t b = vreinterpretq_f32_m128(_b);
return vreinterpretq_m128_f32(vbslq_f32(vreinterpretq_s32_m128(_mask), b, a));
}
marktwtn commented
x86: _mm_blendv_ps
FOR j := 0 to 3
i := j*32
IF mask[i+31]
dst[i+31:i] := b[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
The condition decision is related to the most significant bit.
arm: vbslq_f32
Bitwise Select. This instruction sets each bit in the destination SIMD&FP register
to the corresponding bit from the first source SIMD&FP register
when the original destination bit was 1, otherwise from the second source SIMD&FP register.
The condition decision is related to each bit.
Therefore, the vshrq_n_s32
is necessary.
JishinMaster commented
Okay.
That would explain why it works for me since I use it mostly after comparer instructions which set everything to FF when true.
Thanks