bacpop/pp-sketchlib

Idea for faster distances

Opened this issue · 0 comments

  • Sort the columns of bins (N log N), keeping track of index.
  • Scan through column. Where there is a block of the same value, add one to numerator of all vs all indices in block
  • When closely related, can instead subtract one from those not in the block

This could be faster when either most of the values are the same (close distances) or different.

The output is still N^2 in size. Doing this in a way where distant neighbours are dropped as each column is processed might be a way to make this linear.