/Alike

A simple-but-useful kNN library for NodeJS, comparing JSON Objects using Euclidean distances

Primary LanguageCoffeeScript

Alike

Build Status

A simple-but-useful kNN library for NodeJS, comparing JSON Objects using Euclidean distances, returning top k closest objects.

Supports Normalized Weighted Euclidean distances. Normalize attributes by Standard Deviation. See here.

Features key and filter attributes to do the data assembly for you, Lisp style!

k-Nearest Neighbour function

subject:  vantage point object - will consider each attribute present in this object as a feature
objects:  array of objects that should all have at least the attributes of subject
options:
    - k: (default = unlimited) specifies how many objects to return
    - standardize: (default = false) if true, will apply standardization across all attributes using stdvs - set this to true if your attributes do not have the same scale
    - weights: (default = {}) a hash describing the weights of each attribute
    - key: (default = none) a key function to map over objects, to be used if the subject attributes are nested within key
        e.g. if subject is {a:0} and objects are [{x: {a: 0}}, {x: {a: 2}}], then provide key: function(o) {return o.x}
    - filter: (default = none) a filter function that returns true for items to be considered
        e.g. to only consider objects with non-negative a: function(o) {return o.a >= 0})
    - debug: (default = false) if true, for every object will return distances of individual attributes as well as the overall distance from the subject under a property called 'debug'
        e.g. if subject is {a:0, b:0} and object is {a:3, b:4}, the returned object will be {a: 3, b: 4, debug: {distance:25, details: {a: 9, b: 16}}}

Example usage

Given John Foo's taste for movies:

Attributes Value Weight
explosions810%
romance330%
length65%
humor55%
pigeons1050%

John Foo would like to rent a movie tonight that most closely matches his movie tastes. He collected a DB of movies with numerical values ranging from 1 to 10 for each of the 5 attributes listed above (don't ask how).

John Foo loves his pigeons. It is the most important attribute to him, hence carries 50% of the weight. He does not like romance and wants to make sure that he avoids sappy movies. Even though he likes mid-length movies with explosions and semi-funny movies, he doesn't care as much, as long as the movie features peaceful pigeons.

Perfect case for Alike!

Getting started

To install and add it to your package.json

$ npm install alike --save

Now you can load up the module and use it like so:

knn = require('alike');

options = {
  k: 10,
  weights: {
    explosions: 0.1,
    romance: 0.3,
    length: 0.05,
    humour: 0.05,
    pigeons: 0.5
  }
}

movieTaste = {
  explosions: 8,
  romance: 3,
  length: 5,
  humour: 6,
  pigeons: 10
}

knn(movieTaste, movies, options)

Where movies is an array of objects that have at least those 5 attributes. Returns the top 10 movies from the array. Enjoy! :)

Development

Alike is written in CoffeeScript in the coffee/ folder. You may use make coffee to compile and watch for changes. Unit tests are in the coffee/test/ folder. You can run the tests with npm test or if you are developing, you may use make watch-test to watch while you TDD. :)

Benchmarks

Run it with coffee benchmark/ takes about 1m on a Macbook Air.

The benchmarks are designed to reflect realistically sized sets of data. They don't ship with the npm package to keep things light.

Contributors

flockonus mhahmadi

License

Alike is licensed under the terms of the GNU Lesser General Public License, known as the LGPL.