iit-cs429/main

A2:cosine score() in score.py

Closed this issue · 2 comments

Prof,

Cosine sim(query,doc) = query idf weight * document tf-idf weight/ (query norm) + (doc-norm)

We have norm in index.computer_doc_norm, but what about query norms???
Also norms =[ ] with starting index 0. And doc_id starts at index 1. What is use of norm[0] value?
Because we will find norms according to norm[doc_id].

Very much confused?

As we discussed, there is no need to compute the query norm, since it does not affect rankings.

What is use of norm[0] value?

In that doctest, the index contains a document with id 0

>>> norms = Index().compute_doc_norms({'a': [[0, 3]], 'b': [[0, 4], [1, 5]]}, 2, {'a': 1, 'b': 2})

Here, document 0 has values {'a': 3, 'b': 4}.

Thus, norms[0] is the norm for the document with id 0. (Recall that the return value is a dict from document id to norm). When running on the real dataset, there will be no document with id 0, since we start at 1.

Prof,
In Comment section, it shows norm[0] = 0.444... so I thought it must be a list returning tf-weights. Also You mentioned that doc_id should start from 1 and not 0. So we considered it from 1 not paying attention that inside of TIME file, we have *FILE 1 as a starting line