Port Intel `fec_encode_simd` to `fec_encode` but with unaligned access intrinsics
WojciechMigda opened this issue · 3 comments
WojciechMigda commented
Port Intel `fec_encode_simd` to `fec_encode` but with unaligned access intrinsics
WojciechMigda commented
WojciechMigda commented
void _mm_maskmoveu_si128 (__m128i a, __m128i mask, char* mem_addr)
Synopsis
void _mm_maskmoveu_si128 (__m128i a, __m128i mask, char* mem_addr)
#include <emmintrin.h>
Instruction: maskmovdqu xmm, xmm
CPUID Flags: SSE2
#include <emmintrin.h>
Instruction: maskmovdqu xmm, xmm
CPUID Flags: SSE2
Description
Conditionally store 8-bit integer elements from a into memory using mask (elements are not stored when the highest bit is not set in the corresponding element) and a non-temporal memory hint. mem_addr does not need to be aligned on any particular boundary.
Operation
FOR j := 0 to 15
i := j*8
IF mask[i+7]
MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i]
FI
ENDFOR
Latency and Throughput
Architecture | Latency | Throughput (CPI) |
---|---|---|
Skylake | 6 | 1 |