The task is to implement simple search engine on top of source code repositories.
Your program receives a path to a directory that contains source code. It will traverse the directory recursively and index the text files. Your program will provide users the ability to query the indexed files.
Imagine you have 2 documents doc1
and doc2
. doc1
contains words a
and b
.
doc2
contains words a
and c
. Let's say that
doc1
has number 1
and doc2
has number 2
. Reverse index is a data structure that looks like this:
a - 1,2
b - 1
c - 2
index ${abs_path_to_dir}
- creates index directory for the directorysearch ${query}
- searches the reverse index and returns the files that match the${query}
+word
- document must contain this word-word
- document cannot contain this wordword
- optional words - if provided then document must contain at least one of these words
+class
- finds all files that contain wordclass
+main -int
- finds all files that contain wordmain
but do not contain wordint
+String
equals
trim
- finds all files that contain wordString
and eitherequals
ortrim
or both
- commands are read from standard input ("CLI")
-
index
:- Successfully created index for ${directory_name}.
-
search
:- Found ${x} documents for ${query}. List of files:
* file1
* file2
- Found ${x} documents for ${query}. List of files:
Command: index tests/SimpleTest
Expected output: Successfully created index for SimpleTest.
Command: search +a
Expected output:
Found 2 documents for "+a". List of files:
* doc1.txt
* doc2.txt
Command: search +a -b
Expected output:
Found 1 documents for "+a -b". List of files:
* doc2.txt
Command: search b c
Expected output:
Found 2 documents for "b c". List of files:
* doc1.txt
* doc2.txt
Command: index tests/ComplexTest
Expected output: Successfully created index for ComplexTest.
Command: search +MyFileUtils
Expected output:
Found 2 documents for "+MyFileUtils". List of files:
* AntExercise/src/cz/cuni/mff/fileutils/MyFileUtils.java
* README.md
Good luck :)