Nickel Search implements a basic serverless word prefix search.
In a full text search solution, you expect the server to return documents containing the searched words.
In a prefix search solution, you expect the server to search for all documents containing words starting with a specific prefix.
Given the advanced querying almost any full text search engine allows, prefix search is a subset of a full text search problem. For example, with Lucene (hence Solr, Elastic, and others) you can use *
syntax to search for prefixed words. E.g., adv*
would return documents containing adventure
, advanced
, and other words that start from adv
.
The goal of this project is to allow prefix search in a serverless way, so that you don't have to pay for servers hosting Solr, Elastic, or another server.
- The search doesn't support multi-word search.
- The indexing takes a lot of time and RAM.
- No support for synonyms, stemming/lemmatization.
- No test coverage.
- More ranking sampels needed.
There is a fully functional sample in the /samples directory, which also includes running the indexer as a Docker container on AWS Fargate. See README.md in the /samples directory for more info.
Install Nickel Search:
$> npm install nickel-search
Implement your index model and run indexer:
import nickel from "nickel-search";
class MyBlogPost {
Title: string;
Author: string;
Body: string;
}
const options = {
// Set fields that will be returned with search results
getDisplayedFields: (s3Uri: string, document: MyBlogPost) => ({
Title: document.Title,
Author: document.Author,
}),
// Set fields to search against
getSearchedFields: (s3Uri: string, document: MyBlogPost) => ({
Title: document.Title,
}),
// number of search results per page has to be set when creating the index
resultsPageSize: 50,
// save checkpoints every 100 changes to each hash value
saveThreshold: 100,
// shards in the index store
indexShards: 1000,
// Implement to set search results sort order.
sort: (a: ISearchable, b: ISearchable) => {
let sort = a.weight - b.weight;
if (sort === 0) {
sort = a.original.Title.localeCompare(b.original.Title);
}
return sort;
},
// Data source options
source: nickel.createDataStore<MyBlogPost>({
location: "../sample-data/", // existing folder with JSON files matching MyBlogPost
}),
// Index store options
indexStore: nickel.createIndexStore({
location: "../sample-index/", // existing folder that will store the search index
}),
};
nickel.indexer(options).run();
In the sample above, the indexer will JSON.decode
all files in ../sample-data/
, apply getDisplayedFields
and getSearchedFields
for each file, and save the index in ../sample-index/
. The indexer will split the index into 1000
'shards' ({ options.indexShards: 1000 }
). The number of shards has to be similar when indexing and searching against the same index.
Run the indexer. When it's done, run the search:
import nickel from "nickel-search";
const indexStore = nickel.createIndexStore({
location: "../sample-index/", // search index location
});
const ns = nickel.searcher({ indexShards: 1000 }, indexStore);
const searchResults = await ns.search('nic');
See an example in the ./samples directory.
- Indexer can run fairly long.
- In theory, most time consuming tasks can run in parallel but it is not implemented.
- It will store the entire index in RAM before saving it, so it will require a lot of RAM.
Nickel can help if all of the following is true:
- You have a set of text documents that you want to be able to search using prefixes
- Your dataset does not change often
- You don't need advanced query syntax such as provided by Lucene or other implementations
- You don't want to pay for an always on search server (such as Elastic or Solr)
A simple example scenario is an autocomplete search for book names. We don't need advanced full text search query syntax such as provided by Lucene or other implementation. In a same way many other autocomplete scenarios can be addressed.
Don't use Nickel Search if:
- You need to rank results when querying
- You have KPIs on index update time
- You need advanced syntax querying (AND/OR/etc.)
- You need to get a response in less than 100ms
- Your dataset is larger than RAM available for indexing
- For languages other than English (or maybe submit a PR to support that language?)
Nickel Search is a node.js app that converts a set of documents into a prefix-queriable set of documents, so that you can use the capabilities of the storage system as your prefix-search server. I use it with AWS S3, so it provides a serverless search for my projects.
TODO:
-
Deallocate stack after indexing done, keeping the source and target S3 buckets:
- Move the S3 buckets definition to a different stack, and reference them from the current stack
- Or delete money-consuming objects from the created stack
-
Add storage to Docker container before indexing starts
-
Remove storage from Docker container when indexing finishes.
-
Create a project directory for fabu.
-
Make indexer resumable.
-
Optimize time and memory usage.
-
Try other features of mature full text search solutions and see if they can be added to Nickel.
- Changed the tokenizer to split on more punctuation marks
- Added local file buffer to reduce RAM consumption
- Enhanced sorting performance