- Remove special characters from summary
- Split summary into individual words
- Find frequecny of words extracted in a summary in all summaries and store them in an object(used object instead of map because keys here are only strings) with properties instances, totalFrequency, total Instances and rank(few keywords such as adjectives are given fixed rank of 0., for others rank equals words length)
- Repeat this process for all words of summaries
- Once all the words are processed sort the instances of all keys in the objects
- Store this object as cache in json format
- read the cache.json file and data.json
- store all summaries present in data.json file in a variable
- if cache.json is empty, call preprocess fn to create cache file.,else parse the cache.json file
- extract all possible substrings from user query of word lengths greater than or equal to 1
- now calculate the extracted substrings instances in all summaries stored in allSummaries variable by first checking if a given substring is present in cachewords then skip it.,else calculate the instances and store properties such as instances, totalFrequency, totalInstances and rank(rank equals words length)
- store all the substring instances in an object
- create a map from the object created above
- sort the map based on rank (keys with higher rank are given higher priority, if rank is equal then total frequency and total instances are taken into consideration., if totalFrequency/totalInstances is higher it is given next priority)
- loop throuh values of map and return the top k results
run command node search.js in terminal
For Q2 the above search function is used as helper method to return the data from nodejs server and a few modifications were made To fetch the matching data for user query: An API call is made /summaries endpoint with query text as query params and response is received in json format
Improvisation:
- Tokenizing the words in a better way by handling special characters too.
- search for typos mismatch as well
- fiter results based on proximity of words