Search Engine Exam
Your task is to code a simple efficient search engine that :
- can be added with multiple documents
- can search all documents that contain a certain list of terms
This is the interface that your class needs to implement :
public interface SearchEngine {
public void add(Document document);
public List<DocumentId> search(List<String> terms);
}
A Document object is composed from an id and text which is split into terms automatically :
public class Document {
final String[] terms;
private final DocumentId id;
public Document(DocumentId id, String text){
this.id = id;
terms = text.split(" ");
}
public String[] getTerms(){ return this.terms; }
public DocumentId getId(){ return this.id; }
}
The search method will return all the document ids of all the documents that contain all the search terms. A brute force implementation can be found in the class SimpleSearchEngine.java, and a simple demonstration of what the class does can be found in this simple unit test :
@Test
public void testSimple(){
SimpleSearchEngine searchEngine = new SimpleSearchEngine();
Arrays.asList(
new Document(new DocumentId(0) , "hello world"),
new Document(new DocumentId(1) , "shalom olam dog"),
new Document(new DocumentId(2) , "foo bar food"),
new Document(new DocumentId(3) , "dog cat world"),
new Document(new DocumentId(4) , "hello cat")).forEach(searchEngine::add);
assertEquals(Arrays.asList(), searchEngine.search(Arrays.asList("")));
assertEquals(Arrays.asList(new DocumentId(3), new DocumentId(4)), searchEngine.search(Arrays.asList("cat")));
assertEquals(Arrays.asList(new DocumentId(2)), searchEngine.search(Arrays.asList("foo")));
assertEquals(Collections.emptyList(), searchEngine.search(Arrays.asList("kuku")));
}
Your task is to do the following :
- Fork this repository with your own github account, do commits into your own repository - I'd like to see you progress so commit every once in a while so I can see your progress
- Implement an efficient implementation of SearchEngine
- You have to use Java 8 in order to run this project. You can choose to use gradle or not - whatever you prefer
- Test your implementation with the two provided tests, add more tests of your own if you feel that there are corner cases that haven't been provided
- You can use any tool that you want or you can build the entire thing yourself - the important thing is that you do it on your own
- Would be great to compare your results against the naive algorithm - if your implementation is good it should show considerable improvement