Scripts that run against Watson Assistant for
KFOLD
K fold cross validation on training set,BLIND
Evaluating a blind test, andTEST
Testing the WA against a list of utterances.
In the case of a k-fold cross validation, or a blind set, the tool will output a precision curve, in addition to per-intent true positive and positive predictive value rates, and a confustion matrix.
- Easy to setup in one configuration file.
- Save the state when Assistant service is down in the middle of processing.
- Able to resume from where it stops using modularized scripts.
- Python 3.6.4 +
- Mac users: you may need to initialize Python's SSL certificate store by running
Install Certificates.command
found in/Applications/Python
. See more here
- Install dependencies
pip3 install -r requirements.txt
- Set up parameters properly in configuration file (ex:
config.ini
). Useconfig.ini.sample
to bootstrap your configuration. - Run the process.
python3 run.py -c config.ini
orpython3 run.py -c <path to your config file>
If you have already installed this utility use these steps to get the latest code.
- Upgrade dependencies
pip3 install --upgrade -r requirements.txt
- Update to latest code level
git pull
config.ini
- Configuration file for run.py
.
This is formatted differently for each mode. Review the Examples below to explore the possible modes and how each is configured.
test_input_file.csv
- Test set for blind testing and standard test.
For blind test with golden intent used for comparison:
utterance | golden intent |
---|---|
utterance 0 | intent 0 |
utterance 1 | intent 0 |
utterance 2 | intent 1 |
For standard test, the input must only have one column or error will be thrown:
utterance |
---|
utterance 0 |
utterance 1 |
utterance 2 |
There are a variety of ways to use this tool. Primarily you will execute a k-folds, blind, or standard test.
Run standard test without ground truth
Generate precision/recall for classification test
Generate confusion matrix for classification test
Generate description for intents
Generate long-tail classification results
Run syntax validation patterns on a workspace
Extract utterances leading to a dialog node
This tool can also be used to test a trained Natural Language Classifier (NLC). The configuration is similar to testing Watson Assistant except:
- Use the NLC URL in the
url
parameter (ex:https://gateway.watsonplatform.net/natural-language-classifier/api
) - Specify the
<classifier_id>
in theworkspace_id
parameter in the configuration - Since NLC does not support downloading training data, the original training data must be provided if run in 'kfold' mode (using the
train_input_file
parameter)
-
Due to different coverage among service plans, user may need to adjust
max_test_rate
accordingly to avoid network connection error. -
Users on Lite plans are only able to create 5 workspaces. They should set
fold_num=3
on their k-fold configuration file. -
In case of interrupted execution, the tool may not be able to clean up the workspaces it creates. In this case you will need to manually delete the extra workspaces.
-
Workspace ID is not the Skill ID. In the Watson Assistant user interface, the Workspace ID can be found on the Skills tab, clicking the three dots (top-right of skill), and choosing View API Details.
-
SSL: [CERTIFICATE_VERIFY_FAILED] on Mac means you may need to initialize Python's SSL certificate store by running
Install Certificates.command
found in/Applications/Python
. See more here -
"This utility used to work and now it doesn't." Upgrade to latest dependencies with
pip3 install --upgrade -r requirements.txt
and latest code withgit pull
.