jfalcou/eve

[FEATURE] shuffle_v2 - see if we can rely on the compiler for `mask`, `maskz`

Opened this issue · 2 comments

At the moment the effort to support maskz versions of operations is

a) duplicated:

else if constexpr( !P::has_zeroes )
{
static_assert(P::reg_size > 16, "sanity check - sse alignr is better");
constexpr std::ptrdiff_t shift_epi32 = *starts_from * P::g_size / 4;
constexpr std::ptrdiff_t shift_epi64 = *starts_from * P::g_size / 8;
if constexpr( P::reg_size == 32 )
{
if constexpr( P::g_size >= 8 ) return _mm256_alignr_epi64(y, x, shift_epi64);
else return _mm256_alignr_epi64(y, x, shift_epi32);
}
else
{
if constexpr( P::g_size >= 8 ) return _mm512_alignr_epi64(y, x, shift_epi64);
else return _mm512_alignr_epi32(y, x, shift_epi32);
}
}
else
{
constexpr std::ptrdiff_t shift_epi32 = *starts_from * P::g_size / 4;
constexpr std::ptrdiff_t shift_epi64 = *starts_from * P::g_size / 8;
auto mask = is_na_or_we_logical_mask(p, g, as(x)).storage();
if constexpr( P::reg_size == 16 )
{
if constexpr( P::g_size >= 8 ) return _mm128_maskz_alignr_epi64(mask, y, x, shift_epi64);
else return _mm128_maskz_alignr_epi32(mask, y, x, shift_epi32);
}
else if constexpr( P::reg_size == 32 )
{
if constexpr( P::g_size >= 8 ) return _mm256_maskz_alignr_epi64(mask, y, x, shift_epi64);
else return _mm256_maskz_alignr_epi32(mask, y, x, shift_epi32);
}
else
{
if constexpr( P::g_size >= 8 ) return _mm512_maskz_alignr_epi64(mask, y, x, shift_epi64);
else return _mm512_maskz_alignr_epi32(mask, y, x, shift_epi32);
}

b) untested (I only concerned myself with explicit names)

c) mask with registercases are not addressed at all.

=====================

I suspect compiler can merge the non masked operation + blend with a masked operation.

So - this needs to be checked for sve and avx512.
Bugs filed if not.

mask(z) logic moved into shuffle_driver.
All the zero handling removed.

Probably somewhere after this function: shuffle_v2_driver_multiple_registers

shuffle_v2_driver_multiple_registers(NativeSelector selector,

The tests are split into two files:
test/unit/api/regular/shuffle_v2/shuffle_v2_driver.cpp
test/unit/api/regular/shuffle_v2/shuffle_v2_driver_intergration.cpp

I'm not sure which one to add to at the moment.

You will also need to clean up some P::has_zeroes from
include/eve/detail/shuffle_v2/simd/x86/shuffle_l2.hpp
include/eve/detail/shuffle_v2/simd/arm/sve/shuffle_l2.hpp

Seems like both clang and gcc can do it, at least in some cases. https://godbolt.org/z/h68WxonaT