Monitor your machine learning models using Watson OpenScale in IBM Cloud Pak for Data

Note: This code pattern is part of a series that explores telecom call-drop predictions using IBM Cloud Pak for Data, Data Virtualization, Watson OpenScale, and Cognos Analyics. Other assets included in the series include:

Tutorial - Query across distributed data sources as one: Data virtualization for data analytics

Code pattern - Predict, Manage and Monitor the call drops of cell towers using Cloud Pak for Data

Tutorial - Build dashboards in Cognos Analytics on IBM Cloud Pak for Data

Businesses today are increasingly certain that AI will be a driving force in the evolution of their industries over the next few years. To successful infuse AI into your product or solution, there are many factors that challenges its widespread adoption in the business and to achieving their expected outcomes. A few are listed below:

Building trust - Organizations and businesses tend to be skeptical about AI because of its "black box" nature, resulting in many promising models no going into production.
Algorithm bias - Another inherent problem with AI systems is that they are only as good – or as bad – as the data they are trained on. If the input data is filled with racial, gender, communal or ethnic biases, your model's accuracy is going to eventually drift away.
Making decisions explainable - How can the model prove the reasoning behind its decision-making? It is critical that AI outcomes are fully explainable by keeping a complete track of the inputs and outputs of any AI-powered application.

What if there is one console that makes it easier for business users to track and measure AI outcomes?

In this code pattern we demonstrate a way to monitor your AI models in an application using Watson OpenScale. This will be demonstrated with an example of a Telecomm Call Drop Prediction Model. After the user has completed the code pattern, they will learn:

How to store custom models using open source technology on Watson Machine Learning.
How to deploy a model and connect the model deployment to Watson OpenScale on Cloud Pak for Data and on IBM Cloud.
How to setup Model Fairness and Model Quality montiors with Watson OpenScale on Cloud Pak for Data and on IBM Cloud, using a python notebook.
How to create a project and setup a python notebook on Cloud Pak for Data.

Flow

Data stored into Cloud Pak for Data internal Db.
The joined data is stored back to the internal Db of Cloud Pak for Data and assigned to the current working project.
Create machine learning models using Jupyter Python notebooks to predict call drop, one cell tower at a time.
Model trained and/or stored in Watson Machine Learning, which is also connected to Watson OpenScale.
Configure Fairness, Quality and Explainability Montiors for each cell tower's model, and present within Cloud Pak for Data, or on other external Clouds (Multi-Cloud Architecture).

Featured technologies

Jupyter Notebooks: An open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and explanatory text.
Pandas: An open source library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Prerequisites

Steps

Note: This code pattern includes steps to use WML on IBM Cloud as well as WML within your Cloud Pak for Data instance.

Create a Watson Machine Learning instance
Create a new project in Cloud Pak for Data
Import notebook to Cloud Pak for Data
Follow the steps in the notebook
Display deployment in Watson OpenScale
Additional use-case for Watson OpenScale

1. Create a Watson Machine Learning instance

Note: This step is only for For WML on IBM CLOUD users

Create a new Watson Machine Learning instance on IBM Cloud. Log in to IBM Cloud or sign up for IBM Cloud if you don't have an account by following the on-screen instructions.
Select the location to Dallas region and hit create.
Once the instance is created. Click on Service Credentials. Click on New Credentials and then click on View Credentials. Copy using the icon.

NOTE: Save the credentials. It will be required when running the notebook.

2. Create a new project in Cloud Pak for Data

Once you login to your Cloud Pak for Data instance. Click on the (☰) menu icon in the top left corner of your screen and click Projects.
When you reach the Project list, click on Create and Empty Project. You will be navigated to a new page where you can enter the desired name(or Telco_CallDrop). Once you click on Ok you will go to a new screen. Click on Create to complete your project creation.

For WML on IBM Cloud Pak for Data users

Create a new Deployment Space

Go to the Settings tab. Click on Associate a Deployment Space
Enter a name for the new space and store this name, for future reference.

Note: If you already have Deployment Space created for your project. Skip this step and store the name for future reference

3. Import notebook to Cloud Pak for Data

In the created project page, click on the Add to Project button. Then click on Notebook.
You will be navigated to a new page. Click on the From URL tab.
- For WML on Cloud Users Use this URL - https://github.com/IBM/icp4d-telco-monitor-models-with-wml-openscale/blob/master/notebooks/OpenScale_WML_Cloud.ipynb
- For WML on Cloud Pak for Data Users Use this URL - https://github.com/IBM/icp4d-telco-monitor-models-with-wml-openscale/blob/master/notebooks/OpenScale_WML_CPD_3.0.ipynb

Enter the name TowerA_Call_Drop and click on the Create button.

Note: Choose the Python 3.6 environment.

4. Follow the steps in the notebook

You will run cells individually by highlighting each cell, then either click the Run button at the top of the notebook. While the cell is running, an asterisk ([*]) will show up to the left of the cell. When that cell has finished executing a sequential number will show up (i.e. [17]).

NOTE: For reference, we have included a completed notebook in the /examples directory of this repo. This version of the notebook includes all the executed steps and outputs. See https://github.com/IBM/icp4d-telco-monitor-models-with-wml-openscale/blob/master/examples/Setup_your_AIOS_Dashboard-Example.jupyter-py36.ipynb

4.1. Install the Necessary Packages

Click on the Run icon and install the necessary packages described in the first 3 cells.
Next, restart your kernel by either clicking the restart icon or Kernel > Restart.

4.2. Add the Dataset

Download the dataset using this link- https://github.com/IBM/icp4d-telco-monitor-models-with-wml-openscale/blob/master/data/call_drop_data_train.csv . Right click on Download and click on Save link as. Save the file with .csv extension.

In section 2.2 Add Dataset, highlight the blank cell by clicking on it. Click on the 10/01 button and the browse to select the downloaded data set.
Select the call_drop_data_train.csv file that you added to the project. Under the Insert to code option, click Insert Pandas DataFrame.

IMPORTANT: Ensure the variable name is set to df_data_1.

4.3. Supply the neccessary variables and credentials

For WML on IBM Cloud users

Under section 2.1 Global Variables enter the following:
1. Your desired Model Name
2. Your desired Model Deployment Name
3. The name of an empty schema in your database
Note : Make sure you have an empty schema, i.e. it has no content.

Under section 2.3 Add your WML Credentials, add the credentials you had copied earlier when creating the Watson Machine Learning instance.

Under section 2.4 Update your AIOS Credentials, add the necessary Cloud Pak for Data instance details as instructed in the cell. Replace the <> with the information within the brackets.

Under section 2.5 Add your Db Credentials, add your database credentials and make sure the keys given in the template have values filled in.

NOTE: These credentials will only be used if you have not already configured your Watson OpenScale Datamart. As you will see in section 4.3 Setup the Datamart on AI OpenScale, if the Datamart already exists, it will be used. If not, the notebook will create the Datamart, using these credentials to configure it.

For WML on IBM Cloud Pak for Data users

Enter the Deployment Space Associated for the project as created in Step 2.
Under section 2.5 Add your Db Credentials, add your database credentials and make sure the keys given in the template have values filled in.

NOTE: These credentials will only be used if you have not already configured your Watson OpenScale Datamart. As you will see in section 4.3 Setup the Datamart on AI OpenScale, if the Datamart already exists, it will be used. If not, the notebook will create the Datamart, using these credentials to configure it.

4.4 Add the Subscription ID

Continue running the notebook up to section 4.5 Create a new Subscription.
Before runnning the remaining cells (again, please keep in mind it is better run cell by cell), update the variable subscription_id in section 4.6 Perform Inital Payload Logging, with the ID you created in the previous cell.
You will find this id in the table above the cell, as instructed in the notebook.
Now, run the rest of the notebook to completion.

5. Display deployment in Watson OpenScale

Open the Watson Openscale add-on on Cloud Pak for Data. Click the add-ons icon located in the top right corner of the panel, which will show the list of add-ons. Use the option menu in the Watson OpenScale add-on tile to open it.
You will see the initial setup looking like this:

6. Additional use-case for Watson OpenScale

Included in the data directory of this repo is a file named Telco_training_final.csv). This contains call drop data for multiple cell towers. If you filter the data on the field outgoing_site_id, you can create a file for each cell tower you would like to monitor. Using the same notebook, you can create a new deployment for each model.

In the following description, we show a dashboard that consists of call drop prediction models for 3 different towers - A, B and C.

OpenScale is configured so that it can monitor how your models are performing over time. The following screen shot gives one such snapshot:

As we can see, the model for Tower C demonstrates a fairness bias warning of 92%.

What is a fairness-bias and why do we need to mitigate it?

Data in this day and age comes from a wide variety of sources. Being able to give the right input to a machine learning model is crucial to generating a succesful machine learning implementation in an application. However, if the underlying data is biased in any form, there is a risk that structural inequalities and unfair biases are not just replicated, but also amplified.

Hence, Watson OpenScale is useful when you need to identify the bias, in real-time (i.e., when payload flows into your machine learning system).

Now, that we have identified a bias, let us see the consequence of the bias and what exactly is causing the bias in this example.

Understanding bias

Select the desired deployment using options > View Details. You will see a full dashboard with various charts telling you how your model has been performing over various time intervals.

You may not receive the same values as the below screenshots. Since we have only done an initial setup, you will have only see one line in your charts.

Now select _conds in the Fairness menu on the left side. If you hover over the graph, you can select Click to view details. Select a timeline according to your current date.

To interpret the details, have a look at the screenshot below:

Our dataset contains the fields: Clear,Fog,Haze, Partly Cloudy, and Rain. The graph above suggests the field Fog contains a bias.

So, what does this really mean?

As we can see with the legend, we have the Favorable Outcome and Unfavorable Outcome. With this bias value we see that with the field value Fog compared to the values has a lesser % of favorable outcomes. Why does this affect my model? It creates a disparity and thereby allows your model to be biased towards the unfavorable outcome (in the case of fog).

Ok, so why does it really matter to me?

The best way to explain the consequence of the disparity is by taking this example - suppose we allow the disparity to creep in. Our model will slowly start believing that the moment the weather condition is foggy, it needs to predict the unfavorable outcome(1), and in this case the call will drop. Now, this may not be the case in reality. There maybe other conditions along with the fog that is causing call drops. Hence, it is important to keep monitoring your machine learning model on a real-time basis.

Understanding quality monitors

Now click on the View Transactions button shown in the previous screen shot. You will be presented with a list of each transaction. A transaction is basically the records of all the data that flows into your model. It gives you the logs and if there is a bias identified, it will also be specified.

How to see your model performance

Come back to the dashboard and explore the Area under ROC section. Again, hover over the graph and select Click to view details.

In this example, we have setup only the Area under ROC as a quality measure. As shown in the above image.

You will receive estimates and the configured quality montor values, as shown below. The image shows the overall True Positive and False Positive ratio for the monitored model.

License

This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.

Apache License FAQ