/fast-nca

Fast implementation of the Neighborhood Component Analysis algorithm

Primary LanguagePythonMIT LicenseMIT

Fast NCA

Fast implementation of the Neighborhood Component Analysis algorithm in Python. The ideas behind some of the choices are further expanded in Master's thesis, Fast low-rank metric learning.

Features:

  • Sklearn-like API
  • Same gradient cost as the objective function
  • Avoid overflows when the scale of the metric is large
  • WIP Mini-batch version

Examples

Sample usage from Python:

from nca import NCA
n = NCA()
n.fit(X, y)
X = n.transform(X)

For an example, run the example.py script. Among others the script accepts the type of model and the dataset:

python example.py --model nca --data wine

For a complete description of the available options, just invoke the help prompt:

python example.py -h

Installation

The code depends on the usual Python scientific environment: NumPy, SciPy, Scikit-learn. The required packages are listed in the requirements.txt file and can be installed in a virtual environment as follows:

virtualenv -p python3 venv
source venv/bin/activate
pip install -r requirements.txt

Metric learning

Benchmarks

Related work

  • Other NCA implementations Python + MATLAB
  • Other metric learning algorithms

Acknowledgements

Special thanks to Iain Murray for writing a first version of this code, teaching me about automatic differentiation and supervising my Master's thesis project.

TODO

  • Add requirements
  • Add examples
  • Add example using NCA with Nearest Neighbour
  • Test numerical stability
  • Add argument parsing for example script
  • Add some visualizations
  • Add PCA to the list of models
  • Add concentric circles and noise to the list of datasets
  • Create separate modules, for example: data, models
  • Package the code
  • Do not compute gradients when only the cost function is required
  • Add example on MNIST
  • Add tests
  • Add gradient check tests
  • Compute timings
  • Big O notation for memory and computation
  • Write documentation
  • Implement version with mini-batches
  • Provide links to other implementations and outline differences
  • Motivate metric learning
  • Implement nearest mean metric learning