Bug in _mm_storel_epi64
andrewevstyukhin opened this issue · 1 comments
Hi,
the _mm_storel_epi64
intrinsic for movq m64, xmm
SSE2 instruction performs store low 64 bits
of 128-bit register.
Neon version first reads high portion from memory and then writes it back. Such out of bounds access causes general U.B. in C++ and breaks execution.
Usually I did vst1
in manual porting. For example:
alignas(8) uint8_t alphas[8];
_mm_storel_epi64(reinterpret_cast<__m128i*>(alphas), mt);
BTW, casting does PVS warning V641
The size of the 'alphas' buffer is not a multiple of the element size of the type '__m128i'
=>
alignas(8) uint8_t alphas[8];
vst1_u8(alphas, mt);
So vst1_u64((uint64_t*)a, vget_low_u64(vreinterpretq_u64_m128i(b)));
seems a better solution.
Thank @andrewevstyukhin for pointing this out. Can you send a pull request accordingly?