The Questions on the Natural Language Classifier Application uses the Watson Natural Language Classifier Service to show how to build a question-and-answer application that uses a small amount of training data and to demonstrate some best practices for using the service.
To function correctly this application requires the following items:
- A trained classifier.
- A populated answer store.
- Training data and answer data. Samples of these data types are provided.
Complete the following instructions to set up these items.
Ensure that you have the following prerequisites before you start:
- You need an IBM Bluemix account. If you don't have one, sign up. For more information about the process, see Developing Watson applications with Bluemix.
- Java Development Kit (Version 7u49+, or 8u45+)
- Apache Maven 3.1 or later releases
- Git
To get started, complete each of the following stages in order:
- Clone the app project, build it, and deploy to Bluemix
- Choose which data you want to use
- Train the classifier
- Populate the answer store
-
Clone the framework-ega repository by issuing one of the following commands:
git clone https://github.com/watson-developer-cloud/framework-ega.git
git clone git@github.com:watson-developer-cloud/framework-ega.git
-
Run
mvn install
in the root of the framework-ega repository to build and install the components to your local Maven repository. -
Clone the questions-with-classifier-ega repository by issuing one of the following commands:
git clone https://github.com/watson-developer-cloud/questions-with-classifier-ega.git
git clone git@github.com:watson-developer-cloud/questions-with-classifier-ega.git
-
Run
mvn install
in the root of the questions-with-classifier-ega repository to build and install the components to your local Maven repository.
The questions-with-classifier-ega-war.zip
file is in the /questions-with-classifier-ega/questions-with-classifier-ega-war/target
directory.
In this stage, manually create your application in Bluemix, bind the necessary services to it, and deploy the application code that you built in Stage 1.
- Log in to Bluemix and navigate to the Dashboard.
- Create your app. Make sure it's in the "US South" region.
- Click CREATE AN APP.
- Select WEB.
- Select the starter Liberty for Java, and click CONTINUE.
- Type a unique name for your app, such as
qaclassifier-sample-app
, and click Finish. - Select CF Command Line Interface. If you do not already have it, click Download CF Command Line Interface and install it.
- Click OVERVIEW.
- Add the Natural Language Classifier service to your app. To use an instance of the service that is bound to another app, skip this step.
- Click ADD A SERVICE OR API.
- Select the Watson category, and select the Natural Language Classifier service.
- Ensure that your app is specified in the App dropdown.
- In the Service name field, type a unique name for your service, such as
qaclassifier-sample-classifier
. - Click CREATE. The Restage Application window is displayed.
- Click RESTAGE to restage your app.
- Add SQL database service to your app. To use an instance of the service that is bound to another app, skip this step.
- Click ADD A SERVICE OR API.
- Select the Data and Analytics category, and select the SQL Database service.
- Ensure that your app is specified in the App dropdown.
- In the Service name field, type a unique name for your service, such as
qaclassifier-sample-db
. - Click CREATE. The Restage Application window is displayed.
- Click RESTAGE to restage your app.
- Bind instances of services to your app. If this step is not applicable, skip it.
- Click BIND A SERVICE OR API.
- Select the services that you want to bind to your app, and click ADD. The Restage Application window is displayed.
- Click RESTAGE to restage your app.
- Deploy the application code that you built in Stage 1 by using the Cloud Foundry commands.
- Open the Command Prompt.
- Navigate to the directory that contains the WAR file you that you generated in the previous section by running the following command:
cd /questions-with-classifier-ega/questions-with-classifier-ega-war/target
- Connect to Bluemix by running the following command:
cf api https://api.ng.bluemix.net
- Log in to Bluemix by running the following command. Replace with your Bluemix id, with your organization name, and with your space name.
cf login -u <yourUsername> -o <yourOrg> -s <yourSpace>
- Modify
manifest.yml
so the service names match the names you previously chose when creating them. - Deploy the app to Bluemix by running the following command. Replace with the name of your app.
cf push <yourAppName> -p questions-with-classifier-ega-war.zip
- If the app is not started, click START.
- To view the home page of the app, open (https://yourAppName.mybluemix.net), where yourAppname is the specific name of your app.
The app and its bound services are deployed. However, you must complete the remaining setup stages for the app to function correctly.
Before you can modify the application's database, you must set a password.
-
In your Bluemix application, define the following environment variable and set the value to the password you want:
MANAGE_API_PASSWORD
In this stage, understand the types of data that the app requires and choose whether you want to use sample data or your own data.
The app requires a trained classifier to work properly. To train the classifier, you need training data, which maps a question to a class. The question-to-class mapping must be in CSV format.
question --> class
Additionally, the app requires a populated answer store. The answer store is populated by a file which maps from classes to canonical questions, and by a directory containing formatted HTML answers. The class-to-canonical-question mapping must be in CSV format.
class --> canonicalQuestion
Choose whether you want to use sample data or your own data.
Note that this application will not return any answers if there is not enough confidence from the classifier to do so
The sample data is in the questions.csv
and answers.csv
files in the questions-with-classifier-ega-war > src > main > resources
directory. To use the sample data, go to Stage 3: Train the classifier.
To use your own data instead of the sample data, see Prepare your own data for training the classifier and populating the answer store.
In this stage, train the classifier by using curl.
For more information about training the classifier, see the Classifier API.
- Log in to Bluemix and navigate to your app.
- Click Show Credentials for the Natural Language Classifier service that is bound to your app.
- Copy the values of the
url
,username
, andpassword
parameters in the**credentials**
section. - From a command prompt, run the following curl command. Replace
<username>
,<password>
, and<url>
with the credentials you copied. The questions.csv file is assumed to be in the directory from which you run the command. If necessary, change the path to the file.
`curl -u <username>:<password> -F training_data=@questions.csv -F training_metadata="{\"language\":\"en\",\"name\":\"my_classifier\"}" "https://<url>/v1/classifiers"`
In this stage, populate the answer store by using curl. To populate the answer store in Eclipse, see Populating the answer store in Eclipse
The json file used to populate the answer store is generated during the build. Run mvn package
from the questions-with-classifier-ega
directory to build the project and generate this file.
To see the API that populates the answer store, open https://yourAppName.mybluemix.net/api/doc, and see Manage.
- Ensure that your app is running. If it's not running, open your app in Bluemix and click START.
- From a command prompt, run the following curl command. This assumes the command is run from the questions-with-classifier-ega directory. If necessary, change the path to the file.
`curl -X POST -H "Content-Type: application/json" -d @questions-with-classifier-ega-war/target/classes/answers.json http://yourAppName.mybluemix.net/api/v1/manage/answer`
- Celebrate! You successfully built an app that uses a trained classifier. To see it live, open https://yourAppName.mybluemix.net, where yourAppName is the specific name of your app.
- Explore the Advanced development section to learn how to use your own data and how to use Eclipse to train the classifier and populate the answer store.
Use the following information to use Eclipse for training the classifier and populating the answer store and to train a classifier on your own data.
The training tool has not been updated to use the latest Classifier API. Use the curl instructions.
If you have loaded the code into Eclipse, you can use an included main class to help with training the Classifier. The program makes REST API calls to the Classifier API, which can be done through any REST client. Although you can train multiple classifier instances, the app does not provide a way to specify which instance to use. By design, the app asks for a list of instances and selects the the first instance in the list. To ensure that you are using the correct instance for your app, train and keep only one instance at a time.
- Locate the
training.json
file. You can use one of the following files: * The file that you generated during the Prepare your own data for training the classifier and populating the answer store process. * The sample file in thequestions-with-classifier-ega-war > src > main > resources
directory. - Run the TrainClassifier.java command-line program. Use the following parameters:
```
usage: java com.ibm.watson.app.common.tools.services.classifier.TrainClassifier
-d,--delete If specified, the classifier instance will be deleted if training is not successful
-f,--file <file> The filepath to be used as training data
-l,--url <url> The absolute URL of the NL classifier service to connect to. If omitted, the default will be
used (https://gateway-d.watsonplatform.net/natural-language-classifier-alpha/api)
-p,--password <password> The password to use during authentication to the NL classifier service
-u,--username <username> The username to use during authentication to the NL classifier service
```
You can run the program without specifying a file, or you can specify a file, and the program will launch a training instance. To see all of the commands, type `h` for help.
If you have loaded the code into Eclipse, you can use an included main class to help with populating the answer store. The program makes a REST call to the same /manage/answer API that the curl command uses.
- Locate the
answers.csv
file. You can use one of the following files: * The file that you generated during the Prepare your own data for training the classifier and populating the answer store process. * The sample file in thequestions-with-classifier-ega-war > src > main > resources
directory. - Run the PopulateAnswerStore.java command-line program. Use the following parameters:
```
usage: java com.ibm.watson.app.qaclassifier.tools.PopulateAnswerStore
-d,--directory <directory> The directory containing the html answer files, can point to the file system or the class
path
-f,--file <file> The file to be used to populate the answers, can point to the file system or the class
path
-l,--url <url> The root URL of the application to connect to. If omitted, the default will be used
(http://localhost:9080)
-p,--password <password> The password for the manage API
-u,--user <user> The username for the manage API
```
The program checks the answer store for existing entries before it adds new entries. If existing entries are found, the program stops.
Use the following information to train the classifier on your own data.
A command-line program that trains a classifier in included in the app. It does the following tasks:
- Generates the
training.json
file used by the application at runtime. - Generates the
answers.json
file for populating the answer store via a curl command.
After these files are generated, you replace the sample .json files with them.
Prerequisites
The following information assumes that you have collected and curated ground truth for the application. To use this data in the app, the data must be in a specific format, and it must contain all of the following elements:
- Representative questions.
- Answers to all representative questions.
- A unique associated class name for each answer.
- An associated canonical question for each answer. The canonical question can be an actual question or a paraphrase of one.
- An answer associated by a unique class name with each question.
The methodology for acquiring this data and ensuring that it meets the requirements is outside of the scope of these instructions.
- Generate a CSV file for questions. Use the following format in the questions.csv: QuestionText, LabelId.
| Term | Description |
| ------------- | ------------- |
| QuestionText | The text of the question. |
| LabelId | The unique id of the class that a question corresponds to. It matches the LabelId in the answers file. |
- Generate a CSV file for answers. The command-line program uses a .csv file as input to create and populate an answer store. Use the following format in the answers.csv: LabelId, CanonicalQuestion.
| Term | Description |
| ------------- | ------------- |
| LabelId | The unique id of the label for an answer. It matches the LabelId in the questions file. |
| CanonicalQuestion | The canonical question that is associated with an answer. |
Classes that have no answer value are excluded and are not added to the answer store.
-
For each LabelId in your CSV files, create an HTML file containing formatted answer text in a directory. (The example location is questions-with-classifier-ega-war/src/main/resources/answers.) The file should be named ${LabelID}.html. When you call the 'classify' REST API and pass a question or text string, the classifier responds with a list of classes that are best associated with that text string based on its training data and algorithms. If you want to show the user an answer for those classes, you must associate each class with some answer text.
-
Generate your own training and answers JSON files.
A command-line program creates a training JSON file and an answers JSON file that you can use in the previous stages for training the classifier and populating the answer store. In the
com.ibm.watson.app.qaclassifier.tools
package of thequestions-with-classifier-ega-war
project, a class PopulateAnswerStore.java can be run and supplied with the .csv input files. Use the following command-line parameters for this program:
```
usage: java com.ibm.watson.app.qaclassifier.tools.GenerateTrainingAndPopulationData
-adir,--answerTextDirectory <answerTextDirectory> directory containing answer html files
-ain,--answerInput <answerInput> input csv file containing answers data
-aout,--answerOutput <answerOutput> filename and location for the answer store population data
-qin,--questionInput <questionInput> input csv file containing questions and labels
-qout,--questionOutput <questionOutput> filename and location for the classifier training data
```
- Replace the sample training.json with your own file. It can be found in the
questions-with-classifier-ega-war > src > main > resources
directory. The answers.json file can be used to Populate the answer store by using curl.
What to do next
Use your JSON files to train the classifier and populate the answer store.
- Use curl commands
- Use Eclipse