/monitor-ibm-cloud-pak-with-watson-openscale

Data analysis, model building, and deploying with Watson Machine Learning with notebook

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Monitor IBM Cloud Pak With Watson OpenScale

In this Code Pattern, we will use German Credit data to train, create, and deploy a machine learning model using Watson Machine Learning on Cloud Pak for Data (CP4D). We will create a data mart for this model with Watson OpenScale and configure OpenScale to monitor that deployment, then inject seven days' worth of historical records and measurements for viewing in the OpenScale Insights dashboard.

When the reader has completed this Code Pattern, they will understand how to:

  • Create and deploy a machine learning model using the Watson Machine Learning service on IBM Cloud Pak for Data (ICP4D).
  • Setup Watson OpenScale Data Mart
  • Bind Watson Machine Learning to the Watson OpenScale Data Mart
  • Add subscriptions to the Data Mart
  • Enable payload logging and performance monitor for subscribed assets
  • Enable Quality (Accuracy) monitor
  • Enable Fairness monitor
  • Score the German credit model using the Watson Machine Learning
  • Insert historic payloads, fairness metrics, and quality metrics into the Data Mart
  • Use Data Mart to access tables data via subscription

architecture

Flow

  1. Create a new project on ICP4D
  2. The developer creates a Jupyter Notebook within this project.
  3. OpenScale on ICP4D is connected to a DB2 database, which is used to store Watson OpenScale data.
  4. The notebook is connected to Watson Machine Learning and a model is trained and deployed.
  5. Watson OpenScale is used by the notebook to log payload and monitor performance, quality, and fairness.
  6. OpenScale will monitor the Watson Machine Learning model for performance, fairness, quality, and explainiblity.

Prerequisites

Steps

  1. Clone the repository
  2. Create a new project and deployment space
  3. Configure OpenScale in a Jupyter Notebook
  4. Utilize the dashboard for OpenScale

1. Clone the repository

In a terminal window, execute the following commands:

git clone https://github.com/IBM/monitor-ibm-cloud-pak-with-watson-openscale
cd monitor-ibm-cloud-pak-with-watson-openscale

2. Create a new project and deployment space

Create a new project

In Cloud Pak for Data, we use the concept of a project to collect / organize the resources used to achieve a particular goal (resources to build a solution to a problem). Your project resources can include data, collaborators, and analytic assets like notebooks and models, etc.

  • Launch a browser and navigate to your Cloud Pak for Data deployment.

  • Go the (☰) menu and click Projects:

(☰) Menu -> Projects

  • Click on New project. In the dialog that pops up, select the project type as Analytics project and click Next:

Start a new project

  • Click on the top tile for Create an empty project:

Create an empty project

  • Give the project a unique name, an optional description and click Create:

Pick a name

Create a Deployment Space

Cloud Pak for Data uses the concept of Deployment Spaces to configure and manage the deployment of a set of related deployable assets. These assets can be data files, machine learning models, etc.

  • Go the (☰) menu and click Analyze -> Analytics deployments:

(☰) Menu -> Analytics deployments

  • Click on New deployment space +:

Add New deployment space

  • Click on the top tile for 'Create an empty space':

Create empty deployment space

  • Give your deployment space a unique name, an optional description, then click Create.

Create New deployment space

You will use this space later when you deploy a machine learning model.

Add collaborator

Next, we will add a collaborator to the new deployment space, so that assets we deploy can be monitored in step 4.

  • Click on your new deployment space.

Select deployment space

  • Click on the Access control tab and then click on Add collaborators + on the right.

Deployment space access control

  • Enter "admin" as a Collaborator and select the user from the drop down list. Then click on the Add to list + button. Then click the Add button to finish adding the collaborator.

NOTE: We are adding the user that configured the machine learning instance for OpenScale monitoring. In this case, the user is the admin user. If the user is someone other than "admin", then that user should be added as a collaborator here.

Deployment space add collaborator

You should be brought back to the deployment space page and see your user ID along with the admin user as collaborators for this space.

Deployment space collaborators

3. Configure OpenScale in a Jupyter Notebook

For this part of the pattern we're going to configure our Watson OpenScale service by running a Jupyter Notebook.

Import the notebook

At the project overview click the Add to project + button, and choose Notebook or click the New notebook + option next to the Notebooks section.

Add a new asset

On the next panel select the From URL tab, give your notebook a name, provide the following URL, and choose the Python 3.6 environment:

Provide the notebook URL https://raw.githubusercontent.com/IBM/monitor-ibm-cloud-pak-with-watson-openscale/master/notebooks/ConfigureOpenScale.ipynb.

Add notebook name and URL and click Create notebook.

When the Jupyter notebook is loaded and the kernel is ready then we can start executing cells.

Notebook loaded

Update credentials

  • In the notebook section 1.2 you will add your ICP platform credentials for the WOS_CREDENTIALS.
  • For the url field, change https://w.x.y.z to use the IP address of your ICP cluster, i.e something like: "url": "https://zen-cpd-zen.omid-cp4d-v5-2bef1f4b4097001da9502000c44fc2b2-0001.us-south.containers.appdomain.cloud"
  • For the username, use your login username.
  • For the password, user your login password.
  • For the DATABASE_CREDENTIALS and SCHEMA_NAME values, follow instructions from prerequisites to Create an IBM Cloud instance of DB2 Warehouse

Run the notebook

Important: Make sure that you stop the kernel of your notebook(s) when you are done, in order to prevent leaking of memory resources!

Stop kernel

Spend a minute looking through the sections of the notebook to get an overview. You will run cells individually by highlighting each cell, then either click the Run button at the top of the notebook. While the cell is running, an asterisk ([*]) will show up to the left of the cell. When that cell has finished executing a sequential number will show up (for example, [17]).

Get transactions for Explainability

Under 8.9 Identify transactions for Explainability run the cell. It will produce a series of UIDs for indidvidual ML scoring transactions. Copy one or more of them to examine in the next section.

4. Utilize the dashboard for Openscale

Now that you have created a machine learning model and configured Openscale, you can utilize the OpenScale dashboard to gather insights.

Use the insights dashboard

The Insights Dashboard provides an overview of the models that OpenScale is monitoring.

  • Open the Services tab by clicking the icon in the upper right. Go to the OpenScale tile under the AI category and click Open:

Deploy OpenScale

  • When the dashboard loads, Click on the 'Model Monitors' tab and you will see the one deployment you configured in the previous section.

OpenScale Insight Dashboard Tile Open

Do not worry if the name you see does not match exactly with the screenshot. The deployment name you see will correspond to the variable used in the Jupyter notebook

You will see the triangle with ! under Fairness -> Sex. This indicates that there has been an alert for the Fairness monitor. Alerts are configurable, based on thresholds for fairness outcomes which can be set and altered as desired.

  • By moving your mouse pointer over the graph, you can see the values change, and which contains bias. Click one spot to view the details. Later, we'll click Configure Monitors to get a Fairness endpoint:

OpenScale Fairness Monitor

  • Once you open the details page, you can see more information. Note that you can choose the radio buttons for your choice of data (Payload + Perturbed, Payload, Training, Debiased):

OpenScale Fairness Detail

  • Click on View Transactions to drill deeper. Here you have radio buttons for All transactions and Biased transactions. Each of the individual transactions can be examined to see them in detail. Doing so will cache that transaction, as we will see later.

OpenScale View Transactions

  • Now, go back to the Insights Dashboard page by clicking on the left-hand menu icon for Insights, make sure that you are on the Model monitors tab, and click the 3-dot menu on the tile and then Configure monitors:

OpenScale Configure Monitors

  • Click the Fairness menu, then the Debias Endpoint tab. This is the REST endpoint that offers a debiased version of the credit risk ML model, based on the features that were configured (i.e. Sex and Age). It will present an inference, or score, that attempts to remove the bias that has been detected:

OpenScale Monitors Fairness

  • Then scroll down for code examples on how to use the Fairness Debiased endpoint. You can see code snippets using cURL, Java, and Python, which can be used in your scripts or applications to access this debiased endpoint:

OpenScale Debiased endpoint

  • Similarly, you can choose the Quality menu and choose the Feedback tab to get code for Feedback Logging. This provides an endpoint for sending fresh test data for ongoing quality evaluation. You can upload feedback data here or work with your developer to integrate the code snippet provided to publish feedback data to your Watson OpenScale database.

Examine an individual transaction

  • Click on the left-hand menu icon for Explain a transaction and click one of the transactions that have been run. These are the transactions that have been cached. Alternately, enter the transaction UID you copied after running the notebook from step 3.

NOTE: Each time you create the Explainibility data, the perterbation algorithm is sending 1000's of requests to the deployed Machine Learning REST endpoint, so the first time this is done can take a few seconds.

Explain a transaction

  • From the info icon next to Details:

Explanations show the most significant factors when determining an outcome. Classification models also include advanced explanations. Advanced explanations are not available for regression, image, and unstructured text models.

  • Click on the info icon next to Minimum changes for No Risk outcome and look at the feature values:

Pertinent Negative If the feature values were set to these values, the prediction would change. This is the minimum set of changes in feature values to generate a different prediction. Each feature value is changed so that it moves towards its median value in the training data.

  • Click on the info icon next to Maximum changes allowed for the same outcome and look at the feature values:

Pertinent Positive The prediction will not change even if the feature values are set to these values. This is the maximum change allowed while maintaining the existing prediction. Each feature value is changed so that it moves towards its median value in the training data.

You can see under Most important factors influencing prediction the Feature, Value, and Weight of the most important factors for this score.

A full breakdown of the factors contributing to either "Risk" or "No Risk" are at the bottom.

Using the Analytics tools

  • From the model monitor view, click on Analytics -> Chart Builder. Here you can create charts using various Measurements, Features, and Dimensions of your machine learning model. Change them and examine the charts that are created:

Analytics Chart builder

  • You can also see a chart that breaks down Predictions by Confidence:

Analytics Predictions by Confidence

NOTE: Returning to the Cloud Pak for Data requires altering the URL in the browser, so that the cluster name is the only thing present. Remove anything in the browser URL containing /aiopenscale/*.

License

Apache 2.0