Ch11 Understanding TF-IDF normalization
intelligencethink opened this issue · 0 comments
intelligencethink commented
The explanation of tfidf shown at page326 as below.
def tfidf(term, document, dataset):
term_freq = document.count(term)
doc_freq = math.log(sum(doc.count(term) for doc in dataset) + 1)
return term_freq / doc_freq
Is it right? According to the formula, the total number of documents in the dataset is not shown in doc_freq.