False IDF calculation
leik-software opened this issue · 2 comments
I think the idf value in \TextAnalysis\Indexes\TfIdf::buildIndex is calculated wrong. With my example I get only zero values. As shown in this article https://janav.wordpress.com/2013/10/27/tf-idf-and-cosine-similarity/ the calculation in line 50 should be:
$value = 1+log(($count)/($value));
(add 1 to log())
@leik-software , would you be able to supply a test case, proving its incorrectness?
Thank you,
I have a case where I have just one document, then the calculation would look like this (without 1 added):
$value = log(($count)/($value));
$value = log(1/1);
$value = log(1);
$value = 0;
With this zero result, I need to calculate the cosine similarity where I will divide with zero. Therefore 1 should be added to avoid this exception. But this is just my case, I found examples with and without adding 1. I close this issue again.