Step 2 [5 points] - Indexing
dkutin opened this issue · 0 comments
dkutin commented
Build an inverted index, with an entry for each word in the vocabulary. You can use any appropriate data structure (hash table, linked lists, Access database, etc.). An example of a possible index is presented below.
- Input: Tokens obtained from the preprocessing module
- Output: An inverted index for fast access
For weighting, you can use the tf-idf weighting scheme (w_ij = tf_ij x idf_i). For each query, your system will produce a ranked list of documents, starting with the most similar to the query and ending with the least similar. For the query terms, you can use a modified tf-idf weighting scheme w_iq = (0.5 + 0.5 tf_iq)∙idf_i