WojciechMigda/zfex

Port Intel `fec_encode_simd` to `fec_encode` but with unaligned access intrinsics

WojciechMigda opened this issue · 3 comments

Port Intel `fec_encode_simd` to `fec_encode` but with unaligned access intrinsics
void _mm_maskmoveu_si128 (__m128i a, __m128i mask, char* mem_addr)

Synopsis

void _mm_maskmoveu_si128 (__m128i a, __m128i mask, char* mem_addr)
#include <emmintrin.h>
Instruction: maskmovdqu xmm, xmm
CPUID Flags: SSE2

Description

Conditionally store 8-bit integer elements from a into memory using mask (elements are not stored when the highest bit is not set in the corresponding element) and a non-temporal memory hint. mem_addr does not need to be aligned on any particular boundary.

Operation

FOR j := 0 to 15
	i := j*8
	IF mask[i+7]
		MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i]
	FI
ENDFOR

Latency and Throughput

Architecture Latency Throughput (CPI)
Skylake 6 1