This repository will not be updated. The repository will be kept available in read-only mode.
In this code pattern, we will use IBM Watson Machine Learning and Watson Studio — which allows data scientists and analysts to quickly build and prototype models — to monitor deployments, and to learn over time as more data becomes available. Performance Monitoring and Continuous Learning enables machine learning models to re-train on new data supplied by the user or other data sources. All applications and analysis tools that depend on the model are automatically updated as Watson Studio handles the selection and deployment of the best model.
In this code pattern, we’ll solve a problem for the City of Chicago using the Model Builder to model building violations. We’ll predict which buildings are most likely to fail an inspection, and we'll intelligently rank buildings by their likelihood to fail an inspection, saving time and resources for the city and building inspectors. We’ll begin by building a model on publicly available data from 2017, starting in September. Then, we will introduce data from October, November, and December data to simulate learning and model re-training over time.
When the reader has completed this Code Pattern, they will understand how to:
- Use Watson Studio to create a project and associate services
- Use IBM Machine learning service to take advantage of machine learning models management (continuous learning system) and deployment (online, batch, streaming)
- Use Apache Spark-as-a-service cluster computing framework optimized for extremely fast and large scale data processing.
- Create and deploy self learning Watson Machine learning models
- Initial source data is loaded into IBM Db2 Warehouse on Cloud database.
- The source data is then loaded, as a data asset, into Watson Studio.
- The Watson Machine Learning service uses the source data and computes an evaluation using Apache Spark-as-a-service to create a machine learning model, and saves the evaluation information back to the Db2 Warehouse on Cloud database.
- Apache Spark-as-a-service to compute the evaluation.
- Feedback data is uploaded to the feedback table in the Db2 Warehouse on Cloud database.
- Once the evaluation is done the Watson Machine Learning service creates a machine learning model.
- The model data is exposed through an API.
- Applications can use the API to evaluate new data and create a new model based on continuous learning.
- Watson Machine Learning: Use trusted data to put machine learning and deep learning models into production. Leverage an automated, collaborative workflow to grow intelligent business applications easily and with more confidencelaborate on building conversational AI solution.
- Apache Spark: Apache Spark is an open source cluster computing framework optimized for extremely fast and large scale data processing, which you can access via the newly integrated notebook interface IBM Analytics for Apache Spark.
- IBM Db2 Warehouse on Cloud: IBM Db2 Warehouse on Cloud is an elastic, fully managed cloud data warehouse service that's powered by IBM BLU Acceleration® technology for increased performance and optimization of analytics at a massive scale.
- Watson Studio: IBM Watson Studio provides tools for data scientists, application developers and subject matter experts to collaboratively and easily work with data to build and train models at scale. It gives you the flexibility to build models where your data resides and deploy anywhere in a hybrid environment so you can operationalize data science faster.
- Artificial Intelligence: Artificial intelligence can be applied to disparate solution spaces to deliver disruptive technologies.
- Machine Learning: Machine learning is a field of artificial intelligence that uses statistical techniques to give computer systems the ability to "learn" (e.g., progressively improve performance on a specific task) from data, without being explicitly programmed
- IBM Db2 Warehouse on Cloud: IBM Db2 Warehouse on Cloud is an elastic, fully managed cloud data warehouse service that's powered by IBM BLU Acceleration technology for increased performance and optimization of analytics at a massive scale.
- Continuous Learning: Continuous learning is a method of machine learning, in which input data is continuously used to extend the existing model's knowledge i.e. to further train the model.
- Clone the repo
- Create Watson Studio Project
- Create Db2 Warehouse on Cloud database and add the connection to Watson Studio
- Create and load data into Db2 Warehouse on Cloud database
- Add connected asset into Watson Studio
- Create Apache Spark as a service with IBM Cloud
- Create Watson Machine Learning with IBM Cloud
- Add new Watson Machine Learning Model to Watson Studio
- Add Feedback data and new evaluations to the continuously learning model
- Deploy the model to expose it through an API
- Test the model
Clone the continuous-learning-with-watson-ml-and-db2
locally. In a terminal, run:
$ git clone https://github.com/IBM/continuous-learning-with-watson-ml-and-db2
$ cd continuous-learning-with-watson-ml-and-db2
If you do not already have an IBM Cloud account, sign up for IBM Cloud and login to your IBM cloud account.
First you will need to create an Object Storage
service if you don't already have one. From the catalog, search for object storage
, select Object Storage
service, choose the lite
plan and click create
.
Go back to catalog, search for Watson Studio
, select it, choose the lite
plan and click create
.
Create a new Project by clicking the New Project
link, choose Complete
, give it a name and click create.
From the IBM Cloud catalog search for Db2 Warehouse on Cloud
and create one using the appropriate plan.
Once the service is created, create new credentials by selecting Service Credentials
option in the left navigation panel. Make sure to save the credentials for upcoming steps.
From Watson Studio project that you have created earlier, go to + Add to Project
and choose Connection
Select Db2 Warehouse
from the available options to connect to Db2 Warehouse on Cloud database you created earlier.
Configure the connection based on the Db2 credentials you saved earlier.
From the IBM Db2 warehouse service page, click Manage
and click Open
to go to IBM Db2 Warehouse on Cloud
console.
Open the hamburger menu and select RUN SQL
to open up a SQL editor.
In the sql editor, copy the SQL statement from the file and click Run All
option from the RUN
drop down list at the top right.
Similarly, copy the SQL statement from the file into the SQL editor and click Run All
option from the RUN
drop down list at the top right.
Note that
"_training"
column should be lower case in the create statement and in the trigger.
Next we will be loading the violations
table from a CSV file. Click LOAD
from the hamburger menu, which will bring you to a page where you can upload .csv
file.
Browse the from your project directory that you cloned earlier and click Next
.
Choose the correct Schema
, table VIOLATIONS
and click Next
.
Click Next
on the next screen and click Begin Load
to load the source data from the CSV
file to the VIOLATIONS
table.
In Watson Studio, go to your project and select the + Add to Project
and select Connected assets
option from the dropdown list.
Provide a name, and click Select source
Choose the Db2 database
and the table
that you created in the previous step. Click Create
.
In the next screen, click Create
to create the connected asset which will be used during creating of Watson machine learning model.
From the catalog in IBM Cloud, search for keyword spark
and choose Apache Spark
service.
Create the service using lite
plan.
Once created, we need to add this service to Watson Studio. Go to your Watson studio project, select settings
and from the + Add Service
dropdown list, select spark
and add the existing spark service that you have just created.
From the catalog in IBM Cloud, search for keyword machine learning
and choose IBM Machine Learning
service.
Create the service using lite
plan.
Similar to the previous Step 5, Add the machine learning service you just created to your Watson Studio project.
From the Assets
tab of your Watson Studio project, select + New Watson Machine Learning Model
Provide a name, choose the Machine Learning
and Apache Spark
instance that you added to your project, choose Model Builder
for model typ, choose Manual
so that you can prepare your own data and click Create
.
Select the data asset
that you created earlier from the options.
Once the data is loaded, choose the INSPECTION_STATUS
as the column to predict for new set of data and All
for feature columns. We will be using Binary Classification
. Add Estiimators by clicking the + Add Estimators
link, and in our case we will be using Logistic Regression
and Decision Tree Classifier
. You can select others as well based on what kind of estimator algorithm you want to choose.
Once the training and evaluation is done, you can choose the one that performed the best and then click Save
.
Once the Watson Machine Learning Model is saved, select the Evaluation
tab. First we need to configure the performance monitoring.
-
Add the spark service from the dropdown list. It's the one that you added to your Watson studio project.
-
Choose
areaUnderPR
(performance metric of the model) and select the threshold as 0.8. This means if the performance is under 0.8, the model needs to be re-trained using all the source data and new data and hence continuous learning. -
Use
500
as record count and clickSave
. -
For
Auto Retrain
selectwhen model performance is below threshold
-
For
Auto Deploy
selectwhen performance is better than previous model
-
Add the connection by selecting
Select Feedback Reference Data
and select the Db2 connection that you previously created. -
Once the feedback data is loaded, select
New Evaluation
to evaluate the uploaded feedback data. You can unzip the provided data Chicago building inspection data by month 2017 in the repo and use that monthly inspection data as feedback data. -
When the evaluation is completed we can see where the threshold value lies for this new feedback data. Diagram below shows that the performance exceeds the threshold value and hence the new version of the model is automatically deployed.
-
You can also see the list of evaluations that have been completed and see how the model has been continuously learning
-
You can upload new feedback data repeatedly from the provided data Chicago building inspection data by month 2017 so that the model continuously learns.
- Select the model, and then select
Deployments
tab. Click+ Add Deployment
to add a new deployment, - Provide a name and choose
Web Service
as deployment type. - Now the model is exposed through and API. If you select
Implementation
tab you can see different examples on how to use the newly created API.
You can access and test the API programmatically, or use curl commands. You can also go to the Test
tab and provide a new set of data to evaluate the inspection status.
The result of the evaluation is shown in a horizontal graph located on the right side of the page.
If the evaluation gives an error as shown below, you need to upgrade the Machine Learning
service instance to the Standard
.
- IBM Watson Studio documentation
- IBM Secure Gateway documentation
- Docker documentation
- Db2 Warehouse on Cloud documentation
- Related code pattern: Continuous learning on Db2
- Related video: Continuous Learning on Watson Data Platform
- Artificial Intelligence Code Patterns: Enjoyed this Code Pattern? Check out our other AI Code Patterns.
- AI and Data Code Pattern Playlist: Bookmark our playlist with all of our Code Pattern videos
- With Watson: Want to take your Watson app to the next level? Looking to utilize Watson Brand assets? Join the With Watson program to leverage exclusive brand, marketing, and tech resources to amplify and accelerate your Watson embedded commercial solution.
This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.