Nickel Search

Nickel Search implements a basic serverless word prefix search.

What is prefix search

In a full text search solution, you expect the server to return documents containing the searched words.

In a prefix search solution, you expect the server to search for all documents containing words starting with a specific prefix.

Given the advanced querying almost any full text search engine allows, prefix search is a subset of a full text search problem. For example, with Lucene (hence Solr, Elastic, and others) you can use * syntax to search for prefixed words. E.g., adv* would return documents containing adventure, advanced, and other words that start from adv.

The goal of this project is to allow prefix search in a serverless way, so that you don't have to pay for servers hosting Solr, Elastic, or another server.

Current issues and TODO

The search doesn't support multi-word search.
The indexing takes a lot of time and RAM.
No support for synonyms, stemming/lemmatization.
No test coverage.
More ranking sampels needed.

How to use

There is a fully functional sample in the /samples directory, which also includes running the indexer as a Docker container on AWS Fargate. See README.md in the /samples directory for more info.

Install Nickel Search:

$> npm install nickel-search

Implement your index model and run indexer:

import nickel from "nickel-search";

class MyBlogPost {
    Title: string;
    Author: string;
    Body: string;
}

const options = {
    // Set fields that will be returned with search results
    getDisplayedFields: (s3Uri: string, document: MyBlogPost) => ({
        Title: document.Title,
        Author: document.Author,
    }),
    // Set fields to search against
    getSearchedFields: (s3Uri: string, document: MyBlogPost) => ({
        Title: document.Title,
    }),
    // number of search results per page has to be set when creating the index
    resultsPageSize: 50,
    // save checkpoints every 100 changes to each hash value
    saveThreshold: 100,
    // shards in the index store
    indexShards: 1000,
    // Implement to set search results sort order.
    sort: (a: ISearchable, b: ISearchable) => {
        let sort = a.weight - b.weight;
        if (sort === 0) {
            sort = a.original.Title.localeCompare(b.original.Title);
        }
        return sort;
    },
    // Data source options
    source: nickel.createDataStore<MyBlogPost>({
        location: "../sample-data/", // existing folder with JSON files matching MyBlogPost
    }),
    // Index store options
    indexStore: nickel.createIndexStore({
        location: "../sample-index/", // existing folder that will store the search index
    }),
};

nickel.indexer(options).run();

In the sample above, the indexer will JSON.decode all files in ../sample-data/, apply getDisplayedFields and getSearchedFields for each file, and save the index in ../sample-index/. The indexer will split the index into 1000 'shards' ({ options.indexShards: 1000 }). The number of shards has to be similar when indexing and searching against the same index.

Run the indexer. When it's done, run the search:

import nickel from "nickel-search";

const indexStore = nickel.createIndexStore({
    location: "../sample-index/", // search index location
});

const ns = nickel.searcher({ indexShards: 1000 }, indexStore);

const searchResults = await ns.search('nic');

See an example in the ./samples directory.

Requirements

Indexer can run fairly long.
- In theory, most time consuming tasks can run in parallel but it is not implemented.
It will store the entire index in RAM before saving it, so it will require a lot of RAM.

Features

When to use Nickel Search

Nickel can help if all of the following is true:

You have a set of text documents that you want to be able to search using prefixes
Your dataset does not change often
You don't need advanced query syntax such as provided by Lucene or other implementations
You don't want to pay for an always on search server (such as Elastic or Solr)

A simple example scenario is an autocomplete search for book names. We don't need advanced full text search query syntax such as provided by Lucene or other implementation. In a same way many other autocomplete scenarios can be addressed.

When not to use Nickel Search

Don't use Nickel Search if:

You need to rank results when querying
You have KPIs on index update time
You need advanced syntax querying (AND/OR/etc.)
You need to get a response in less than 100ms
Your dataset is larger than RAM available for indexing
For languages other than English (or maybe submit a PR to support that language?)

How it works

Nickel Search is a node.js app that converts a set of documents into a prefix-queriable set of documents, so that you can use the capabilities of the storage system as your prefix-search server. I use it with AWS S3, so it provides a serverless search for my projects.

Future steps

TODO:

Deallocate stack after indexing done, keeping the source and target S3 buckets:
- Move the S3 buckets definition to a different stack, and reference them from the current stack
- Or delete money-consuming objects from the created stack
Add storage to Docker container before indexing starts
Remove storage from Docker container when indexing finishes.
Create a project directory for fabu.
Make indexer resumable.
Optimize time and memory usage.
Try other features of mature full text search solutions and see if they can be added to Nickel.

Release notes

v0.3

Changed the tokenizer to split on more punctuation marks
Added local file buffer to reduce RAM consumption
Enhanced sorting performance

aynurin/nickel-search