Configurable Naive Bayes Classifier for text with cross-validation support
Classify text, analyse sentiments, recognize user intents for chatbot using wink-naive-bayes-text-classifier
. It is a part of wink — a growing family of high quality packages for Statistical Analysis, Natural Language Processing and Machine Learning in NodeJS.
It's API offers a rich set of features:
- Configure text preparation task such as amplify negation, tokenize, stem, remove stop words, and propagate negation using wink-nlp-utils or any other package of your choice.
- Configure Lidstone or Lapalce additive smoothing.
- Configure Multinomial or Binarized Multinomial Naive Bayes model.
- Export and import learnings in JSON format that can be easily saved on hard-disk.
- Evaluate learning to perform n-fold cross validation.
- Obtain comprehensive metrics including confusion matrix, precision, and recall.
Use npm to install:
npm install wink-naive-bayes-text-classifier --save
// Load Naive Bayes Text Classifier
var Classifier = require( 'wink-naive-bayes-text-classifier' );
// Instantiate
var nbc = Classifier();
// Load NLP utilities
var nlp = require( 'wink-nlp-utils' );
// Configure preparation tasks
nbc.definePrepTasks( [
// Simple tokenizer
nlp.string.tokenize0,
// Common Stop Words Remover
nlp.tokens.removeWords,
// Stemmer to obtain base word
nlp.tokens.stem
] );
// Configure behavior
nbc.defineConfig( { considerOnlyPresence: true, smoothingFactor: 0.5 } );
// Train!
nbc.learn( 'I want to prepay my loan', 'prepay' );
nbc.learn( 'I want to close my loan', 'prepay' );
nbc.learn( 'I want to foreclose my loan', 'prepay' );
nbc.learn( 'I would like to pay the loan balance', 'prepay' );
nbc.learn( 'I would like to borrow money to buy a vehicle', 'autoloan' );
nbc.learn( 'I need loan for car', 'autoloan' );
nbc.learn( 'I need loan for a new vehicle', 'autoloan' );
nbc.learn( 'I need loan for a new mobike', 'autoloan' );
nbc.learn( 'I need money for a new car', 'autoloan' );
// Consolidate all the training!!
nbc.consolidate();
// Start predicting...
console.log( nbc.predict( 'I would like to borrow 50000 to buy a new Audi R8 in New York' ) );
// -> autoloan
console.log( nbc.predict( 'I want to pay my car loan early' ) );
// -> prepay
Defines the text preparation tasks
to transform raw incoming text into an array of tokens required during learn()
, evaluate()
and predict()
operations. The tasks
should be an array of functions. The first function in this array must accept a string as input; and the last function must return an array of tokens as JavaScript Strings. Each function must accept one input argument and return a single value. definePrepTasks
returns the count of tasks
.
As illustrated in the usage, wink-nlp-utils offers a rich set of such functions.
Defines the configuration from the config
object. This object must define 2 properties viz. (a) considerOnlyPresence
and smoothingFactor
. The considerOnlyPresence
must be a boolean — true indicates a binarized model; default value is false. The smoothingFactor
defines the value for additive smoothing; its default value is 1. The defineConfig()
must be called before attempting to learn.
Simply learns that the input
belongs to the label
. If the input is a JavaScript String, then definePrepTasks()
must be called before learning.
Consolidates the learning. It is a prerequisite for evaluate()
and/or predict()
.
It is used to evaluate the learning against a test data set. The input
is used to predict the label, which is compared with the label
to populate a confusion matrix.
It computes a detailed metrics consisting of macro-averaged precision, recall and f-measure along with their label-wise values and the confusion matrix.
Predicts the label for the input
. If it is unable to predict then it returns a value 'unknown'
.
Computes the log base-2 of odds of every label for the input
; and returns the array of [ label, odds ]
in descending
order of odds
. Here is an example of the returned array:
[
[ 'prepay', 6.169686751688911 ],
[ 'autoloan', -6.169686751688911 ]
]
If it is unable to make prediction then it returns a value [ [ 'unknown', 0 ] ]
.
The learning can be exported as JSON text that may be saved in a file.
An existing JSON learning can be imported for prediction. It is essential to definePrepTasks()
and consolidate()
before attempting to predict.
Returns basic stats of learning in terms of count of samples under each label, total words, and the size of vocabulary.
It completely resets the classifier by re-initializing all the learning related variables, except the preparatory tasks. It is useful during cross fold-validation.
If you spot a bug and the same has not yet been reported, raise a new issue or consider fixing it and sending a pull request.
wink-naive-bayes-text-classifier is copyright 2017 GRAYPE Systems Private Limited.
It is licensed under the under the terms of the GNU Affero General Public License as published by the Free Software Foundation, version 3 of the License.