Very minor optimization: _mm_abs_epi8 instead of _mm_sign_epi8
Myriachan opened this issue · 1 comments
Myriachan commented
It'd be a very minor optimization, if it does anything measurable at all, but:
static_cast<uint32_t>(_mm_movemask_epi8(_mm_sign_epi8(ctrl, ctrl))));
Could be this instead:
static_cast<uint32_t>(_mm_movemask_epi8(_mm_abs_epi8(ctrl))));
pabsb
is also SSSE3, so requirements don't change. The advantage here is that pabsb
is non-destructive, which could produce slightly better code. You'd have to try it, though...
greg7mdp commented
Thanks for the suggestion, I really appreciate it.
Since I am not an expert in SSE2, I'm hesitant to make a change, especially since you mention it would be a very minor (if any) improvement, and I'd rather err on the safe side.
So I'll close the issue, but feel free to let me know if you can measure any significant improvement with this (or any other change).