jsoftware/jsource

investigate linear and polynomial interpolation, b-trees for I.

moon-chilled opened this issue · 3 comments

investigate linear and polynomial interpolation, b-trees for I.

eytzinger probably wastes bandwidth if we can do multiple searches in parallel; otoh, if we are bottlenecked by latency, then eytzinger is good because prefetches don't take up rob space

b-tree budgets one gpr per key. Vs 2 with the current scheme, or 2 vectors w/gather (which I'm warming to considering the comparatively better performance on intel & with avx512). Gather exposes more parallelism, but b-tree is more memory-efficient. Not sure which wins, especially in the face of potential partitioning of y. b-tree requires preprocessing though, which is a win for gather at some size classes.