/sifter.js

A library for textually searching arrays and hashes of objects by property (or multiple properties). Designed specifically for autocomplete.

Primary LanguageJavaScript

sifter.js

NPM version Installs Build Status Coverage Status

Sifter is a client and server-side library (via UMD) for textually searching arrays and hashes of objects by property – or multiple properties. It's designed specifically for autocomplete. The process is three-step: score, filter, sort.

  • Supports díåcritîçs.
    For example, if searching for "montana" and an item in the set has a value of "montaña", it will still be matched. Sorting will also play nicely with diacritics.
  • Smart scoring.
    Items are scored / sorted intelligently depending on where a match is found in the string (how close to the beginning) and what percentage of the string matches.
  • Multi-field sorting.
    When scores aren't enough to go by – like when getting results for an empty query – it can sort by one or more fields. For example, sort by a person's first name and last name without actually merging the properties to a single string.
  • Nested properties.
    Allows to search and sort on nested properties so you can perform search on complex objects without flattening them simply by using dot-notation to reference fields (ie. nested.property).
$ npm install sifter # node.js
$ bower install sifter # browser

Usage

var sifter = new Sifter([
	{title: 'Annapurna I', location: 'Nepal', continent: 'Asia'},
	{title: 'Annapurna II', location: 'Nepal', continent: 'Asia'},
	{title: 'Annapurna III', location: 'Nepal', continent: 'Asia'},
	{title: 'Eiger', location: 'Switzerland', continent: 'Europe'},
	{title: 'Everest', location: 'Nepal', continent: 'Asia'},
	{title: 'Gannett', location: 'Wyoming', continent: 'North America'},
	{title: 'Denali', location: 'Alaska', continent: 'North America'}
]);

var result = sifter.search('anna', {
	fields: ['title', 'location', 'continent'],
	sort: [{field: 'title', direction: 'asc'}],
	limit: 3
});

Seaching will provide back meta information and an "items" array that contains objects with the index (or key, if searching a hash) and a score that represents how good of a match the item was. Items that did not match will not be returned.

{"score": 0.2878787878787879, "id": 0},
{"score": 0.27777777777777773, "id": 1},
{"score": 0.2692307692307692, "id": 2}

Items are sorted by best-match, primarily. If two or more items have the same score (which will be the case when searching with an empty string), it will resort to the fields listed in the "sort" option.

The full result comes back in the format of:

{
	"options": {
		"fields": ["title", "location", "continent"],
		"sort": [
			{"field": "title", "direction": "asc"}
		],
		"limit": 3
	},
	"query": "anna",
	"tokens": [{
		"string": "anna",
		"regex": /[aÀÁÂÃÄÅàáâãäå][nÑñ][nÑñ][aÀÁÂÃÄÅàáâãäå]/
	}],
	"total": 3,
	"items": [
		{"score": 0.2878787878787879, "id": 0},
		{"score": 0.27777777777777773, "id": 1},
		{"score": 0.2692307692307692,"id": 2}
	]
}

API

#.search(query, options)

Performs a search for query with the provided options.

Option Type Description
fields array An array of property names to be searched.
limit integer The maximum number of results to return.
sort array An array of fields to sort by. Each item should be an object containing at least a "field" property. Optionally, direction can be set to "asc" or "desc". The order of the array defines the sort precedence.

Unless present, a special "$score" property will be automatically added to the beginning of the sort list. This will make results sorted primarily by match quality (descending).
sort_empty array Optional. Defaults to "sort" setting. If provided, these sort settings are used when no query is present.
filter boolean If false, items with a score of zero will not be filtered out of the result-set.
conjunction string Determines how multiple search terms are joined ("and" or "or", defaults to "or").
nesting boolean If true, nested fields will be available for search and sort using dot-notation to reference them (e.g. nested.property)
Warning: can reduce performance
respect_word_boundaries boolean If true, matches only at start of word boundaries (e.g. the beginning of words, instead of matching the middle of words)

CLI

CLI

Sifter comes with a command line interface that's useful for testing on datasets. It accepts JSON and CSV data, either from a file or from stdin (unix pipes). If using CSV data, the first line of the file must be a header row.

$ npm install -g sifter
$ cat file.csv | sifter --query="ant" --fields=title
$ sifter --query="ant" --fields=title --file=file.csv

Contributing

Install the dependencies that are required to build and test:

$ npm install

First build a copy with make then run the test suite with make test.

When issuing a pull request, please exclude "sifter.js" and "sifter.min.js" in the project root.

License

Copyright © 2013–2015 Brian Reavis & Contributors

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.