Clone the repo.
If you want to run search-module separately, you have to update the data.json file in /search-module/src. Then run the command:
cd src/
node preprocess
The above command will preprocess your data.json file to enable search feature.
cd books-search
Install the dependencies by running
npm install
To preprocess your data.json file run:
node preprocess
To start the server, run:
npm start
The server should be up and running. If you want different data for your books, updata data.json file in the following format:
titles: ['Book title'],
summaries: [
{
id: 0,
summary: 'Book summary here'
}
],
authors: [
{
book_id: 0,
author: 'book author',
}
]
}
The titles, summaries and authors should have one-to-one mapping in lists, with id corresponding to the index of corresponding book title.
This repo consists of two parts:
Search-Engine is a module written using Javascript to search data.json file containing a list of titles, summaries and authors on an "input string" and returns a list of "k" book objects containing title, summary, and author.
The data.json file is initially preprocessed to create a preprocessedData.json file containing map of words and list of book_id and number of occurence of that word in summaries.
After the preproccessing is done, we can search the preprocessed data to find the list of books whose summary contain the input string.
The search uses KMP algorithm to construct preprocessedData.json file. To return the list of books, it uses m pointers algorithm (where m stands for number of words in input string).
Books-Search is a React app created using create-react-app. It lets users search summaries through an autocomplete input, and lists the titles corresponding to that summary. It uses the above search-module to implement the search functionality.
It has responsive web design, and an attractive user-interface.
The search-module has two tests namely preprocess.test and search.test. preprocess.test runs when the mockPreprocessedData.json file contains empty object. Similarly, search.test file runs for two exposed functions of search module, preprocessData and searchBooks.
To run this module, you will need node >= 12.0
on your local machine.
The current implementation is for a set of 50 books, with room for scalability. If we are assuming to implement it for 10^6 books, we can make a few assumptions:
- Each summary may not contain 10^4 distinct words
- If any keyword ocuurs more that 10^4 times, we do not store all the book ids for them while preprocessing, we keep only MIN_COUNT values of most relevant occurrences.
- If any keyword occurrs less than MIN_COUNT, we do not store them in preprocessed data.
- Need better RAM to run code on memory
For challanges I faced please visit CHALLANGES.md