Try caching the query vector's FloatVector segments when computing distance
alexklibisz opened this issue · 2 comments
Background
Here is the l2Distance method from PanamaFloatVectorOps:
Note how lines 77 and 78 are instantiating a FloatVector from an array. We do this in several iterations because the FloatVectors need to be a specific length in order to leverage the SIMD operations.
v1
and pv1
correspond to the query vector. v2
and pv2
correspond to the vector stored in the index. In a typical query, we compute l2Distance for exactly one query vector against many stored vectors.
So, we should be able to memoize or cache an array of segments for v1
and then avoid re-copying v1
into pv1
, i.e., avoid line 77 in the method.
Unclear if this will have any positive effect, but it's worth a quick try.
Deliverables
- Try caching/memoizing the segments of
v1
to avoid re-computing them.
Related Issues
No response
I took a first pass at implementing this in #605. The benchmark looks promising on my M1 Macbook, but not so much on my EC2 r6i.8xlarge instance.
On M1 Macbook:
[info] Benchmark Mode Cnt Score Error Units
[info] CachedL2DistanceBenchmark.baseline thrpt 6 2584985.052 ± 9945.960 ops/s
[info] CachedL2DistanceBenchmark.cached thrpt 6 4461915.029 ± 43007.018 ops/s
On EC2 r6i.8xlarge:
[info] Benchmark Mode Cnt Score Error Units
[info] CachedL2DistanceBenchmark.baseline thrpt 6 2939742.574 ± 273069.456 ops/s
[info] CachedL2DistanceBenchmark.cached thrpt 6 2698700.860 ± 13063.707 ops/s
It seems on the M1, we actually benefit from pre-computing and fetching the chunks from memory. Not so much on standard x86 architecture on the EC2 instance.
Closing as I'm skeptical that this would work.