andreas-abel/nanoBench

Missing latency entry for gathers

travisdowns opened this issue · 2 comments

You measure many latency stats for gathers which is awesome (and a very important formalization of the way we think about latency), but I think you are missing the most important one.

That is is the 2 -> 1 (address) latency but through the vector index register, not the base register. That's probably the most common latency chain you'll have in practice because it generalizes the notion of pointer chasing. That is, a loop like:

vpgatherdd ymm0,DWORD PTR [r14+ymm14*1],ymm1
vpor ymm14,ymm0,ymm0

On my SKL machine I measure the same latency (22) for this: same as for the 3->1 latency.

Fixed.