ttezel/bayes

Classifier does not work, when text contains "contructor" as token.

Closed this issue · 5 comments

The problem is this line: https://github.com/ttezel/bayes/blob/master/lib/naive_bayes.js#L248

Naivebayes.prototype.frequencyTable = function (tokens) {
  var frequencyTable = {}

  tokens.forEach(function (token) {
    if (!frequencyTable[token])
      frequencyTable[token] = 1
    else
      frequencyTable[token]++
  })

  return frequencyTable
}

When token is "constructor", frequencyTable[token] is always true, because every object in Javascript natively has the constructor property. Therefore frequencyTable[token]++ runs and this results in NaN.

To fix this, we need to check for if (!frequencyTable.hasOwnProperty(token)). We will overwrite the constructor property, but we do not need it for the object anyway.

You can also do frequencyTable = Object.create(null) instead, which should be faster. Also, it is cleaner than overwriting frequencyTable.constructor.

Yes you are right. However, I think we should do both.

This has been fixed in bayes v0.0.5. Run npm update twit to get it! Thanks.

You need to apply this also to this.vocabulary, this.docCount, this.wordCount, this.wordFrequencyCount and this.wordFrequencyCount[categoryName] to be safe.

use a Map instead, since es6 is a thing now