/BuzzHash

A Julia package based on S. Dasgupta, C. F. Stevens, and S. Navlakha (2017). A neural algorithm for a fundamental computing problem. Science, 358, 6364:793-796.

Primary LanguageJupyter NotebookOtherNOASSERTION

BuzzHash

A locality sensitive hashing algorithm based on Dasgupta, et. al. A neural algorithm for a fundamental computing problem (preprint).

(See also Groschner et. al. Dendritic Integration of Sensory Evidence in Perceptual Decision-Making)

An implementation of the algorithm described by Desgupta, et. al., in the above reference and recently published in Science magazine [1]. The algorithm is based on the olfactory system of Drosophila melanogaster, though it is not a simulation of that system. It is, instead, a biomimetic abstraction applicable to the computer science problem of similarity search, but might well serve as an abstract model of neural organization applicable to other organisms and neural subsystems. In the last regard, the authors mention three possibilities: mouse olfaction, rat cerebellum, and rat hippocampus including entorhinal cortex, dentate gyrus, and hilar cells.

The algorithm is simple. Its input is a real vector, x, which in the fly's case has 50 components representing aggregate output of 50 types of olfactory receptor. The first step is to form x-mean(x) i,e, to subtract the mean of x from each of its elements. This mimics the biological step known as divisive normalization. The "mean centered" input is then multiplied by a sparse 1/0 matrix, A, which expands the number of components in x-mean(x). In the fly's case, the 50 components are expanded to 2000, each of the 2000 values being an aggregate of about 6 original. The non-zero components of A are randomly chosen and, of course, once chosen are fixed. The final step is zeroize all but the largest few components of A(x-mean(x)) and (by default) setting the rest to 1, leaving a sparse vector to act as a tag. (The last step, setting the largest components to 1, may be skipped by calling with clip=false.) In the fly's case the top 5% are retained.

With high probability, the columns of matrix A are linearly independent, which means the input, x-mean(x), can be recovered from its the product, A(x-mean(x)). Note, however, that the final step in which all but the largest values of A(x-mean(x)) are zeroized would make recovery from the final hash approximate at best.

This package provides three algorithms, sprand_fd to form the matrix, A, buzzhash(A, x, topN) to apply the algorithm to x, retaining the topN maximum values, and inverse(A) to form the matrix which can recover x-mean(x) from the product A(x-mean(x)). The notebook, usage.ipynb illustrates basic usage.

Interested parties should also see @dataplayer12's Python implementation and his very nice explanatory post at Medium.

TODO: Initial implementations of sprand_fd(KC, PN, PN_per_KC) and inverse(A) require optimization. They perform very poorly for large KC.

References:

[1] S. Dasgupta, C. F. Stevens, and S. Navlakha (2017). A neural algorithm for a fundamental computing problem. Science, 358, 6364:793-796.

[2] Charles F. Stevens A statistical property of fly odor responses is conserved across odors

[3] Charles F. Stevens What the fly's nose tells the fly's brain

[4] John Myles White MNIST.jl

[future] Cengiz Pehlevan, Alexander Genkin, Dmitri B. Chklovskii, A clustering neural network model of insect olfaction doi: https://doi.org/10.1101/226746

[future] A. A. Fenton et. al. Unmasking the CA1 ensemble place code by exposures to small and large environments: more place cells and multiple, irregularly-arranged, and expanded place fields in the larger space

[future]Kiehn and Forssberg Scientific Background: The Brain's Navigational Place and Grid Cell System(PDF)