Missing latency entry for gathers
travisdowns opened this issue · 2 comments
travisdowns commented
You measure many latency stats for gathers which is awesome (and a very important formalization of the way we think about latency), but I think you are missing the most important one.
That is is the 2 -> 1 (address) latency but through the vector index register, not the base register. That's probably the most common latency chain you'll have in practice because it generalizes the notion of pointer chasing. That is, a loop like:
vpgatherdd ymm0,DWORD PTR [r14+ymm14*1],ymm1
vpor ymm14,ymm0,ymm0
On my SKL machine I measure the same latency (22) for this: same as for the 3->1 latency.
andreas-abel commented
Fixed.
travisdowns commented
Thanks @andreas-abel!