The Questions on the Natural Language Classifier Application uses the Watson Natural Language Classifier Service to show how to build a question-and-answer application that uses minimal ground truth and to demonstrate some best practices for using the service.
To function correctly this application requires the following items:
- A trained classifier.
- A populated answer store.
- Training data and answer data. Samples of these data types are provided.
Complete the following instructions to set up these items.
Ensure that you have the following prerequisites before you start:
- You need an IBM Bluemix account. If you don't have one, sign up. For more information about the process, see Developing Watson applications with Bluemix.
- Java Development Kit
- Apache Maven 3.1 or later releases
- Git
To get started, complete each of the following stages in order:
- Clone the app project, build it, and deploy to Bluemix
- Determine the data you want to use
- Train the classifier
- Populate the answer store
Clicking this button will perform the following steps automatically:
- Prompts you to log in to Bluemix, or to create an account
- Creates a Bluemix DevOps Services project and initializes a new Git repository
- Clones the questions-with-classifier-ega project into the Git repository
- Builds the project
- Creates any required Bluemix services
- Deploys the app to Bluemix
The DevOps Services project will be set up to automatically deploy changes to Bluemix when you commit new changes to your git repository.
The entire process will take a few minutes to complete. Even though the app is deployed at this point, you still need to follow the steps below before the app will function correctly.
In this stage, understand the types of data that the app requires and choose whether you want to use sample data or your own data.
The app requires a trained classifier to work properly. To train the classifier, you need training data, which maps a question to a class. The question-to-class mapping must be in JSON format.
question --> class
Additionally, the app requires a populated answer store. The answer store is populated by answers data, which maps the classes from the training data to answers. The class-to-answer mapping must be in JSON format.
class --> answer
Choose whether you want to use sample data or your own data.
The sample data is in the training.json
and answers.json
files in the questions-with-classifier-ega-war > src > main > resources
directory. To use the sample data, go to Stage 3: Train the classifier.
To use your own data instead of the sample data, see Prepare your own data for training the classifier and populating the answer store.
In this stage, train the classifier by using curl. To train the classifier in Eclipse, see Training the classifier in Eclipse.
For more information about training the classifier, see the Classifier API.
- Log in to Bluemix and navigate to your app.
- Click Show Credentials for the Natural Language Classifier service that is bound to your app.
- Copy the values of the
url
,username
, andpassword
parameters in the**credentials**
section. - From a command prompt, run the following curl command. Replace
<username>
,<password>
, and<url>
with the credentials you copied. The questions.csv file is assumed to be in the directory from which you run the command. If necessary, change the path to the file.
`curl -u <username>:<password> -F training_data=@questions.csv -F training_metadata="{\"language\":\"en\",\"name\":\"my_classifier\"}" "https://<url>/v1/classifiers"`
In this stage, populate the answer store by using curl. To populate the answer store in Eclipse, see Populating the answer store in Eclipse
To see the API that populates the answer store, open https://yourAppName.mybluemix.net/api, and see Manage.
- Ensure that your app is running. If it's not running, open your app in Bluemix and click START.
- Ensure that your answers.json file matches your training.json file.
- From a command prompt, run the following curl command. The answers.json file is assumed to be in the directory from which you run the command. If necessary, change the path to the file.
`curl -X POST -H "Content-Type: application/json" -d @answers.json http://yourAppName.mybluemix.net/api/v1/manage/answer`
- Celebrate! You successfully built an app that uses a trained classifier. To see it live, open https://yourAppName.mybluemix.net, where yourAppName is the specific name of your app.
- Explore the Advanced development section to learn how to use your own data and how to use Eclipse to train the classifier and populate the answer store.
Use the following information to use Eclipse for training the classifier and populating the answer store and to train a classifier on your own data.
If you don't want to use the "Deploy to Bluemix" button, you can clone and build the project yourself.
-
Clone the framework-ega repository by issuing one of the following commands:
git clone https://github.com/watson-developer-cloud/framework-ega.git
git clone git@github.com:watson-developer-cloud/framework-ega.git
-
Run
mvn install
in the root of the framework-ega repository to build and install the components to your local Maven repository. -
Clone the questions-with-classifier-ega repository by issuing one of the following commands:
git clone https://github.com/watson-developer-cloud/questions-with-classifier-ega.git
git clone git@github.com:watson-developer-cloud/questions-with-classifier-ega.git
-
Run
mvn install
in the root of the questions-with-classifier-ega repository to build and install the components to your local Maven repository.
The questions-with-classifier-ega-war.war
file is in the /questions-with-classifier-ega/questions-with-classifier-ega-war/target
directory.
Note: Once in public github the framework-ega dependencies will be in maven central and step #1 and #2 will no longer be necessary
In this stage, create your application in Bluemix, bind the necessary services to it, and deploy the application code that you built in Stage 1.
-
Log in to Bluemix and navigate to the Dashboard.
-
Create your app.
- Click CREATE AN APP.
- Select WEB.
- Select the starter Liberty for Java, and click CONTINUE.
- Type a unique name for your app, such as
qaclassifier-sample-app
, and click Finish. - Select CF Command Line Interface. If you do not already have it, click Download CF Command Line Interface and install it.
- Click OVERVIEW.
-
Add the Natural Language Classifier service to your app. To use an instance of the service that is bound to another app, skip this step.
- Click ADD A SERVICE OR API.
- Select the Watson category, and select the Natural Language Classifier service.
- Ensure that your app is specified in the App dropdown.
- In the Service name field, type a unique name for your service, such as
qaclassifier-sample-classifier
. - Click CREATE. The Restage Application window is displayed.
- Click RESTAGE to restage your app.
-
Add SQL database service to your app. To use an instance of the service that is bound to another app, skip this step.
- Click ADD A SERVICE OR API.
- Select the Data Management category, and select the SQL Database service.
- Ensure that your app is specified in the App dropdown.
- In the Service name field, type a unique name for your service, such as
qaclassifier-sample-db
. - Click CREATE. The Restage Application window is displayed.
- Click RESTAGE to restage your app.
-
Bind instances of services to your app. If this step is not applicable, skip it.
- Click BIND A SERVICE OR API.
- Select the services that you want to bind to your app, and click ADD. The Restage Application window is displayed.
- Click RESTAGE to restage your app.
-
Deploy the application code that you built in Stage 1 by using the Cloud Foundry commands.
- Open the Command Prompt.
- Navigate to the directory that contains the WAR file you that you generated in Stage 1 by running the following command:
cd /questions-with-classifier-ega/questions-with-classifier-ega-war/target
- Connect to Bluemix by running the following command:
cf api https://api.ng.bluemix.net
- Log in to Bluemix by running the following command. Replace with your Bluemix id, with your organization name, and with your space name.
cf login -u <yourUsername> -o <yourOrg> -s <yourSpace>
- Deploy the app to Bluemix by running the following command. Replace with the name of your app.
cf push <yourAppName> -p questions-with-classifier-ega-war.war
-
If the app is not started, click START.
-
To view the home page of the app, open (https://yourAppName.mybluemix.net), where yourAppname is the specific name of your app.
The app and its bound services are deployed. However, you must complete the remaining setup stages for the app to function correctly.
If you want to secure the answer store endpoints, see Deploying with security.
If you have loaded the code into Eclipse, you can use an included main class to help with training the Classifier. The program makes REST API calls to the Classifier API, which can be done through any REST client. Although you can train multiple classifier instances, the app does not provide a way to specify which instance to use. By design, the app asks for a list of instances and selects the the first instance in the list. To ensure that you are using the correct instance for your app, train and keep only one instance at a time.
- Locate the
training.json
file. You can use one of the following files: * The file that you generated during the Prepare your own data for training the classifier and populating the answer store process. * The sample file in thequestions-with-classifier-ega-war > src > main > resources
directory. - Run the TrainClassifier.java command-line program. Use the following parameters:
```
usage: java com.ibm.watson.app.common.tools.services.classifier.TrainClassifier
-d,--delete If specified, the classifier instance will be deleted if training is not successful
-f,--file <file> The filepath to be used as training data
-l,--url <url> The absolute URL of the NL classifier service to connect to. If omitted, the default will be
used (https://gateway-d.watsonplatform.net/natural-language-classifier-alpha/api)
-p,--password <password> The password to use during authentication to the NL classifier service
-u,--username <username> The username to use during authentication to the NL classifier service
```
You can run the program without specifying a file, or you can specify a file, and the program will launch a training instance. To see all of the commands, type `h` for help.
If you have loaded the code into Eclipse, you can use an included main class to help with populating the answer store. The program makes a REST call to the same /manage/answer API that the curl command uses.
- Locate the
answers.json
file. You can use one of the following files: * The file that you generated during the Prepare your own data for training the classifier and populating the answer store process. * The sample file in thequestions-with-classifier-ega-war > src > main > resources
directory. - Run the PopulateAnswerStore.java command-line program. Use the following parameters:
```
usage: java com.ibm.watson.app.qaclassifier.tools.PopulateAnswerStore
-l,--url <url> The root URL of the application to connect to. If omitted, the default will be used
(http://localhost:9080)
-p,--path <path> The path to be used as training data, can point to the file system or the class path
```
The program checks the answer store for existing entries before it adds new entries. If existing entries are found, the program stops.
Use the following information to train the classifier on your own data.
A command-line program that trains a classifier in included in the app. It does the following tasks:
- Generates the
training.json
file for training the classifier. - Generates the
answers.json
file for populating the answer store.
After these files are generated, you replace the sample .json files with them.
Prerequisites
The following information assumes that you have collected and curated ground truth for the application. To use this data in the app, the data must be in a specific format, and it must contain all of the following elements:
- Representative questions.
- Answers to all representative questions.
- A unique associated class name for each answer.
- An associated canonical question for each answer. The canonical question can be an actual question or a paraphrase of one.
- An answer associated by a unique class name with each question.
The methodology for acquiring this data and ensuring that it meets the requirements is outside of the scope of these instructions.
- Generate a CSV file for questions. The command-line program uses a .csv file as input to create the training JSON file to be uploaded to the classifier by using the REST API. Use the following format in the questions.csv: QuestionText, LabelId.
| Term | Description |
| ------------- | ------------- |
| QuestionText | The text of the question. |
| LabelId | The unique id of the class that a question corresponds to. It matches the LabelId in the answers file. |
- Generate a CSV file for answers. When you call the 'classify' REST API and pass a question or text string, the classifier responds with a list of classes that are best associated with that text string based on its training data and algorithms. If you want to show the user an answer for those classes, you must associate each class with some answer text. The command-line program uses a .csv file as input to create and populate an answer store. Use the following format in the answers.csv: LabelId, AnswerValue, CanonicalQuestion.
| Term | Description |
| ------------- | ------------- |
| LabelId | The unique id of the label for an answer. It matches the LabelId in the questions file. |
| AnswerValue | The text of the answer. |
| CanonicalQuestion | The canonical question that is associated with an answer. |
Classes that have no answer value are excluded and are not added to the answer store.
-
Generate your own training and answers JSON files.
A command-line program creates a training JSON file and an answers JSON file that you can use in the previous stages for training the classifier and populating the answer store. In the
com.ibm.watson.app.classifier.tools
package of thequestions-with-classifier-ega-war
project, a class PopulateAnswerStore.java can be run and supplied with the .csv input files. Use the following command-line parameters for this program:
```
usage: java com.ibm.watson.app.classifier.tools.GenerateTrainingAndPopulationData
-ain,--answerInput <answerInput> input csv file containing answers data
-aout,--answerOutput <answerOutput> filename and location for the answer store population data
-qin,--questionInput <questionInput> input csv file containing questions and labels
-qout,--questionOutput <questionOutput> filename and location for the classifier training data
```
- Replace the sample JSON files with your own files. The sample
answers.json
andtraining.json
files are in thequestions-with-classifier-ega-war > src > main > resources
directory.
What to do next
Use your JSON files to train the classifier and populate the answer store.
- Use curl commands
- Use Eclipse
If you completed the previous stages, the app is deployed without security around any of the internal API endpoints. No security is acceptable for development but not for production, particularly for the /manage
endpoint, which allows access to the answer store.
To deploy the app with security enabled, complete these steps:
-
Find the .zip file that contains the .war and a server.xml with security configured by running the following command:
cd /questions-with-classifier-ega/questions-with-classifier-ega-war/target
-
Re-run the following
cf push
command:cf push <yourAppName> -p questions-with-classifier-ega-war.war
-
In your Bluemix application, define the following environment variable and set the value to the password you want:
MANAGE_API_PASSWORD