greg7mdp/parallel-hashmap

Very minor optimization: _mm_abs_epi8 instead of _mm_sign_epi8

Myriachan opened this issue · 1 comments

It'd be a very minor optimization, if it does anything measurable at all, but:

static_cast<uint32_t>(_mm_movemask_epi8(_mm_sign_epi8(ctrl, ctrl))));

Could be this instead:

static_cast<uint32_t>(_mm_movemask_epi8(_mm_abs_epi8(ctrl))));

pabsb is also SSSE3, so requirements don't change. The advantage here is that pabsb is non-destructive, which could produce slightly better code. You'd have to try it, though...

Thanks for the suggestion, I really appreciate it.

Since I am not an expert in SSE2, I'm hesitant to make a change, especially since you mention it would be a very minor (if any) improvement, and I'd rather err on the safe side.

So I'll close the issue, but feel free to let me know if you can measure any significant improvement with this (or any other change).