p12tic/libsimdpp

Feature request: Add compile-time blend masks

Opened this issue · 0 comments

The intrinsics blend functions

  • __m256i _mm256_blendv_epi8(__m256i v1, __m256i v2, __m256i mask)
  • __m256i _mm256_blend_epi16(__m256i a, __m256i b, const int imm8)
  • __m256i _mm256_blend_epi32(__m256i a, __m256i b, const int imm8)

have different performance characteristics. Among them the function _mm256_blend_epi32() is the fastest but its mask needs to be encoded into an const int imm8 at compile-time. That hinders its use in the blend implementation of the current libsimdpp if I understand correctly (see also #56)

For masks that are already known at compile-time, I think it would be good to represent them in a new fashion. For instance the blend mask could be represented as a tuple from the library boost::hana

    auto mask = hana::make_tuple(
      hana::true_c,  hana::true_c,
      hana::true_c,  hana::true_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 

      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::true_c,  hana::true_c, 
      hana::true_c,  hana::true_c, 

      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 

      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c
    );

The immediate mask for _mm256_blend_epi32() could then be computed at compile-time.
I made an proof-of-concept implementation of this in

https://github.com/eriksjolund/compile-time-simd-blend-mask