/fastpair

FastPair: Data-structure for the dynamic closest-pair problem.

Primary LanguageJupyter NotebookMIT LicenseMIT

FastPair

Data-structure for the dynamic closest-pair problem.

Overview

tag

This project is an implementation of the FastPair dynamic closest-pair data-structure described in David Eppstein's Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs. The data-structure is based on the observation that the conga line data-structure, in practice, does better the more subsets you give to it: even though the worst case time for $k$ subsets is $O(nk\log{(n/k)})$, that worst case seems much harder to reach than the nearest neighbor algorithm.

In the limit of arbitrarily many subsets, each new addition or point moved by a deletion will be in a singleton subset, and the algorithm will differ from nearest neighbors in only a couple of ways:

  1. When we create the initial data structure, we use a conga line rather than all nearest neighbors, to keep the in-degree of each point low, and
  2. When we insert a point, we don't bother updating other points' neighbors.
Complexity
Total space $20n$ bytes (could be reduced to $4n$ at some cost in update time)
Time per insertion or single distance update $O(n)$
Time per deletion or point update $O(n)$ expected, $O(n^2)$ worst case
Time per closest pair $O(n)$

This Python version of the algorithm combines ideas and code from the closest-pair data structure testbed (C++) developed around a series of papers by Eppstein et al.

Installation

FastPairs has not yet been uploaded to PyPi, as we are currently at the 'pre-release' stage*. Having said that you should be able to install it via pip directly from the GitHub repository with:

pip install git+git://github.com/carsonfarmer/fastpair.git

You can also install FastPair by cloning the GitHub repository and using the setup script:

git clone https://github.com/carsonfarmer/fastpair.git
cd fastpair
pip install .

* This means the API is not set, and subject to crazy changes at any time!

Testing

Continuous Integration codecov

FastPair comes with a comprehensive preliminary range of tests. To run tests, install as an editable, development package:

pip install -e .[tests]

This will install fastpair itself, its functional dependencies, and the testing/development dependencies. Tests can be run with pytest as follows:

pytest -v fastpair --cov fastpair

Currently fastpair is tested against Python 3.{10,11,12}.

Utilizing FastPair

This notebooks linked below are designed as interactive, minimum tutorials in working with fastpair and require additional dependencies, which can be installed with:

pip install -e .[tests,notebooks]

License

Copyright © 2016, Carson J. Q. Farmer
Copyright © 2002-2015, David Eppstein
Licensed under the MIT License.