investigate linear and polynomial interpolation, b-trees for I.
moon-chilled opened this issue · 3 comments
moon-chilled commented
investigate linear and polynomial interpolation, b-trees for I.
moon-chilled commented
also eytzinger https://arxiv.org/pdf/1509.05053.pdf https://algorithmica.org/en/eytzinger cf
moon-chilled commented
eytzinger probably wastes bandwidth if we can do multiple searches in parallel; otoh, if we are bottlenecked by latency, then eytzinger is good because prefetches don't take up rob space
moon-chilled commented
b-tree budgets one gpr per key. Vs 2 with the current scheme, or 2 vectors w/gather (which I'm warming to considering the comparatively better performance on intel & with avx512). Gather exposes more parallelism, but b-tree is more memory-efficient. Not sure which wins, especially in the face of potential partitioning of y. b-tree requires preprocessing though, which is a win for gather at some size classes.