ttezel/bayes

Not taking custom tokenizer?

Closed this issue · 2 comments

I'm rusty on my JS so I'm probably doing something dumb here, but I can't get your classifier to take a custom tokenizer.

const classifier = bayes({'tokenizer': tokenizer});

var tokenizer = function (text) {
  var rgxPunctuation = /[^(a-zA-Z)+\s]/g

  var sanitized = text.replace(rgxPunctuation, ' ').toLowerCase();

  return sanitized.split(/\s+/)
}

If I put a console.log in there, it's clear it's not getting executed.

I am just passing through, but you might try putting your var tokenizer above const classifier (where it is used) and adding a new in front of bayes({...:

var tokenizer = function (text) {
  var rgxPunctuation = /[^(a-zA-Z)+\s]/g

  var sanitized = text.replace(rgxPunctuation, ' ').toLowerCase();

  return sanitized.split(/\s+/)
};

const classifier = new bayes({'tokenizer': tokenizer});

I tried the above code in RunKit and it appeared to work as expected.

Note: You could also use a function statement for tokenizer to maintain its position in your code:

// since you are already using ES6, you might consider the object properties shorthand ;)
// the "new" is needed either way
const classifier = new bayes({tokenizer});

function tokenizer(text) {
  var rgxPunctuation = /[^(a-zA-Z)+\s]/g

  var sanitized = text.replace(rgxPunctuation, ' ').toLowerCase();

  return sanitized.split(/\s+/)
}

Ah, thanks so much @jhwohlgemuth !
I hate getting rusty on things.
All worky.