Feature request: Add compile-time blend masks
Opened this issue · 0 comments
eriksjolund commented
The intrinsics blend functions
- __m256i _mm256_blendv_epi8(__m256i v1, __m256i v2, __m256i mask)
- __m256i _mm256_blend_epi16(__m256i a, __m256i b, const int imm8)
- __m256i _mm256_blend_epi32(__m256i a, __m256i b, const int imm8)
have different performance characteristics. Among them the function _mm256_blend_epi32() is the fastest but its mask needs to be encoded into an const int imm8
at compile-time. That hinders its use in the blend implementation of the current libsimdpp if I understand correctly (see also #56)
For masks that are already known at compile-time, I think it would be good to represent them in a new fashion. For instance the blend mask could be represented as a tuple from the library boost::hana
auto mask = hana::make_tuple(
hana::true_c, hana::true_c,
hana::true_c, hana::true_c,
hana::false_c, hana::false_c,
hana::false_c, hana::false_c,
hana::false_c, hana::false_c,
hana::false_c, hana::false_c,
hana::true_c, hana::true_c,
hana::true_c, hana::true_c,
hana::false_c, hana::false_c,
hana::false_c, hana::false_c,
hana::false_c, hana::false_c,
hana::false_c, hana::false_c,
hana::false_c, hana::false_c,
hana::false_c, hana::false_c,
hana::false_c, hana::false_c,
hana::false_c, hana::false_c
);
The immediate mask for _mm256_blend_epi32() could then be computed at compile-time.
I made an proof-of-concept implementation of this in