simd-everywhere/simde

AVX512F: _mm512_fmaddsub_pd is missing

Closed this issue · 7 comments

It seem like the AVX512F _mm512_fmaddsub_pd intrinsic is missing.

Could it please be added? :)

mr-c commented

Hello @robinchrist ; I would accept a PR to add a simde_mm512_fmaddsub_pd in a new file named simde/x86/avx512/fmaddsub.h ; I can assist you in that if you would like. Take a look at the following:

@mr-c yeah this is way too complicated for me, honestly

The complexity lies in the addsub, e.g.

simde/simde/x86/avx.h

Lines 1734 to 1767 in fa6a869

SIMDE_FUNCTION_ATTRIBUTES
simde__m256d
simde_mm256_addsub_pd (simde__m256d a, simde__m256d b) {
#if defined(SIMDE_X86_AVX_NATIVE)
return _mm256_addsub_pd(a, b);
#else
simde__m256d_private
r_,
a_ = simde__m256d_to_private(a),
b_ = simde__m256d_to_private(b);
#if defined(SIMDE_LOONGARCH_LASX_NATIVE)
simde__m256d_private aev_, aod_, bev_, bod_;
aev_.i256 = __lasx_xvpickev_d(a_.i256, a_.i256);
aod_.i256 = __lasx_xvpickod_d(a_.i256, a_.i256);
bev_.i256 = __lasx_xvpickev_d(b_.i256, b_.i256);
bod_.i256 = __lasx_xvpickod_d(b_.i256, b_.i256);
aev_.d256 = __lasx_xvfsub_d(aev_.d256, bev_.d256);
aod_.d256 = __lasx_xvfadd_d(aod_.d256, bod_.d256);
r_.i256 = __lasx_xvilvl_d(aod_.i256, aev_.i256);
#elif SIMDE_NATURAL_VECTOR_SIZE_LE(128)
r_.m128d[0] = simde_mm_addsub_pd(a_.m128d[0], b_.m128d[0]);
r_.m128d[1] = simde_mm_addsub_pd(a_.m128d[1], b_.m128d[1]);
#else
SIMDE_VECTORIZE
for (size_t i = 0 ; i < (sizeof(r_.f64) / sizeof(r_.f64[0])) ; i += 2) {
r_.f64[ i ] = a_.f64[ i ] - b_.f64[ i ];
r_.f64[i + 1] = a_.f64[i + 1] + b_.f64[i + 1];
}
#endif
return simde__m256d_from_private(r_);
#endif
}

I can TRY, but don't expect a great solution. Especially not something that has LoongArch or NEON in it

Would you want the non-fma addsub in a separate file? Or in the same file as simde_mm512_fmaddsub_pd?

Hm, interesting. A _mm512_addsub_pd does not exist.
Should _mm512_addsub_pd be added nevertheless, or only with with the simde_ prefix?
Or should I simply embed the functionality of _mm512_addsub_pd in _mm512_fmaddsub_pd and not offer it separately?

mr-c commented

I can TRY, but don't expect a great solution. Especially not something that has LoongArch or NEON in it

We only need a plain implementation to merge; no SIMDE_*_NATIVE code needed. Others will come along with optimized implementations later. Often the compilers do a great job!

Hm, interesting. A _mm512_addsub_pd does not exist. Should _mm512_addsub_pd be added nevertheless, or only with with the simde_ prefix? Or should I simply embed the functionality of _mm512_addsub_pd in _mm512_fmaddsub_pd and not offer it separately?

Your choice. If you need it, then call it simde_x_mm512_addsub_pd, and it can go in the same file.

I've made a first PR #1246, no clue whether this works yet (tests should / will follow).
Can you please let me know in the PR whether this goes into the desired direction?

mr-c commented

Fixed in #1246 ; thank you!