As shown above, this application leverages machine learning models to predict your insurance charges, and helps the customer understand how smoking or decreasing your BMI affects insurance premiums.
As we see the value of gross insurance premiums worldwide continue to skyrocket past 5 trillion dollars, we know that most of these costs are preventable. For example, just by eliminating smoking, and lowering your BMI by a few points could mean shaving thousands of dollars off of your premium charges. In this application, we study the effects of age, smoking, BMI, gender, and region to determine how much of a difference these factors can make on your insurance premium. By using our application, customers see the radical difference their lifestyle choices make on their insurance charges. By leveraging AI and machine learning, we help customers understand just how much smoking increases their premium, by predicting how much they will have to pay within seconds.
Using IBM AutoAI, you automate all the tasks involved in building predictive models for different requirements. You see how AutoAI generates great models quickly which save time and effort and aid in faster decision-making process. You create a model that from a data set that includes the age, sex, BMI, number-of-children, smoking preferences, region and charges to predict the health insurance premium cost that an individual pays.
When you have completed this code pattern, you understand how to:
- Setup, quickly, the services on IBM Cloud for building the model.
- Ingest the data and initiate the AutoAI process.
- Build different models using AutoAI and evaluate the performance.
- Choose the best model and complete the deployment.
- Generate predictions using the deployed model by making REST calls.
- Compare the process of using AutoAI and building the model manually.
- Visualize the deployed model using a front-end application.
- The user creates an IBM Watson Studio Service on IBM Cloud.
- The user creates an IBM Cloud Object Storage Service and adds that to Watson Studio.
- The user uploads the insurance premium data file into Watson Studio.
- The user creates an AutoAI Experiment to predict insurance premium on Watson Studio
- AutoAI uses Watson Machine Learning to create several models, and the user deploys the best performing model.
- The user uses the Flask web-application to connect to the deployed model and predict an insurance charge.
- IBM Watson Studio - IBM Watson® Studio helps data scientists and analysts prepare data and build models at scale across any cloud.
- IBM Watson Machine Learning - IBM Watson® Machine Learning helps data scientists and developers accelerate AI and machine-learning deployment.
- IBM Cloud Object Storage - IBM Cloud™ Object Storage makes it possible to store practically limitless amounts of data, simply and cost effectively.
- artificial-intelligence - Build and train models, and create apps, with a trusted AI-infused platform.
- Python - Python is an interpreted, high-level, general-purpose programming language.
This Cloud pattern assumes you have an IBM Cloud account. Go to the link below to sign up for a no-charge trial account - no credit card required.
- Download the data set
- Clone the repo
- Explore the data (optional)
- Create IBM Cloud services
- Create and Run AutoAI experiment
- Create a deployment and test your model
- Create a notebook from your model (optional)
- Run the application
We will use an insurance data set from Kaggle. You can find it here.
Click on the Download
button, and you should see
that you will download a file named insurance-premium-prediction.zip
. Once you unzip the file, you should see insurance.csv
.
This is the data set we will use for the remainder of the example. Remember that this example is purely educational, and you
could use any data set you want - we just happened to choose this one.
Clone this repo onto your computer in the destination of your choice:
git clone https://github.com/IBM/predict-insurance-charges-with-ai
This gives you access to the notebooks in the notebooks
directory. To explore the data before creating a model,
you can look at the Claim Amount Exploratory notebook, and create a IBM Cloud Object Storage service, and paste your credentials in the notebook to run it. This step is purely optional.
If you want to run the notebook that is explored below, go to notebooks/Claim Amount Exploratory.ipynb
.
-
Within Watson Studio, you explore the data before you create any machine learning models. You want to understand the data, and find any trends between what you are trying to predict (insurance premiums charges) and the data's features.
-
Once you import, you see the data into a data frame, and call the
df_claim.head()
function, you see the first 5 rows of the data set. You see the features to beage
,sex
,bmi
,children
,smoker
, andregion
.
- To check if there is a strong relationship between
bmi
andcharges
you create a scatter plot using the seaborn and matplotlib libraries. You see that there is no strong correlation betweenbmi
andcharges
, as shown below.
- To check if there is a strong relationship between
sex
andcharges
you create a box plot. You see that the average claims for males and females are similar, whereas males have a bigger proportion of the higher claims.
- To check if there is a strong relationship between being a
smoker
andcharges
you create a box plot. You see that if you are a smoker, your claims are much higher on average.
- Let's see if the
smoker
group is well represented. As you see, below, it is. There are around 300 smokers, and around 1000 non-smokers.
- To check if there is a strong relationship between being a
age
andcharges
you create a scatter plot. You see that claim amounts increase with age, and tend to form groups around 12,000, 30,000, and 40,000.
If you want to see all of the code, and run the notebook yourself, check the data folder above.
First login to your IBM Cloud account. Use the video below for directions on how to create IBM Watson Studio Service.
-
After logging into IBM Cloud, click
Proceed
to show that you have read your data rights. -
Click on
IBM Cloud
in the top left corner to ensure you are on the home page. -
Within your IBM Cloud account, click on the top search bar to search for cloud services and offerings. Type in
Watson Studio
and then click onWatson Studio
underCatalog Results
. -
This takes you to the Watson Studio service page. There you can name the service as you wish. For example, one may name it
Watson-Studio-trial
. You can also choose which data center to create your instance in. The gif above shows mine as being created in Dallas. -
For this guide, you choose the
Lite
service, which is no-charge. This has limited compute; it is enough to understand the main functionality of the service. -
Once you are satisfied with your service name, and location, and plan, click on create in the bottom-right corner. This creates your Watson Studio instance.
-
To launch your Watson Studio service, go back to the home page by clicking on
IBM Cloud
in the top-left corner. There you see your services, and under there you should see your service name. This might take a minute or two to update. -
Once you see your service that you just created, click on your service name, and this takes you to your Watson Studio instance page, which says
Welcome to Watson Studio. Let's get started!
. Click on theGet Started
button. -
This takes you to the Watson Studio tooling. There you see a heading that says
Start by creating a project
and a button that saysCreate Project
. Click onCreate a Project
. Next click onCreate an Empty project
. -
On the create a new project page, name your project. One may name the project -
insurance-demo
. You also need to associate an IBM Cloud Object store instance, so that you store the data set. -
Under
Select Storage service
click on theAdd
button. This takes you to the IBM Cloud Object Store service page. Leave the service on theLite
tier and then click theCreate
button at the bottom of the page. You are prompted to name the service and choose the resource group. Once you select a name, click the resource groupConfirm
button. -
Once you've confirmed your IBM Cloud Object Store instance, you are taken back to the project page. Click on
refresh
and you should see your newly created Cloud Object Store instance underStorage
. That's it! Now you can clickCreate
at the bottom right of the page to create your first IBM Watson Studio project :)
-
Once you have created your Watson Studio Project, you see a blue
Add to Project
button on the top-right corner of your screen. Click onAdd to Project
and then selectData
. This brings up a column on the right-hand side that saysData
. -
In the Data column, click on
browse
to add data from a file. Go into where you downloaded your dataset from Step 0 and then navigate to thedata
folder, and then selectinsurance.csv
. -
Watson Studio takes a couple of seconds to load the data, and then you should see the import has completed. To make sure it has worked properly, you can click on
Assets
on the top of the page, and you should see your insurance file underData Assets
.
-
Once you've created your project, click on the
Add to project
at the top-right of your Watson Studio project page. This pops up an image with different assets you can choose to add to your project. Click onAutoAI experiment
. -
This takes you to a page which says
New AutoAI experiment
at the top-left. Name your experiment as you want. One may name itauto-ai-insurance-demo
. -
Next, you need to add a Watson Machine Learning instance before you create the Watson AutoAI experiment. On the right side of the screen click on
Associate a Machine Learning instance
. -
Same as before, select the
Lite
Tier, and click on theCreate
button at the bottom of the page. Name your instance as you wish. One may name it named minemachine-learning-free
. Choose the location and the resource group and then click onConfirm
when you are happy with your instance details. -
Once you create your machine learning service, you are taken back to the new AutoAI experiment page. Click on
Reload
on the right side of the screen. You should see your newly created machine learning instance. Great job! Click onCreate
on the bottom right part of your screen to create your first AutoAI experiment!
-
After you create your experiment, you are taken to a page to add a data source to your project. Click on
Select from project
and then add theinsurance.csv
file. Click onSelect asset
to confirm your data source. -
Next, you see that AutoAI processes your data, and you see a
What do you want to predict
section. Select thecharges
as thePrediction column
.
-
Next, let's explore the AutoAI settings to see what you can customize when running your experiment. Click on
Experiment settings.
First, you see thedata source
tab, which lets you omit certain columns from your experiment. You choose to leave all columns. You can also select the training data split. It defaults to 85% training data. The data source tab also shows which metric you
optimize for. For the regression, it is RMSE (Root Mean Squared Error), and for other types of experiments, such as Binary Classification, AutoAI defaults to Accuracy. Either way, you can change the metric from this tab depending on your use case. -
Click on the
Prediction
tab from within theExperiment settings
. There you can select from Binary Classification, Regression, and Multiclass Classification. -
Lastly, you can see the
Runtime
tab from theExperiment settings
this shows you other experiment details you may want to change depending on your use case. -
Once you are happy with your settings, ensure you are predicting for the
charges
column, and click on the runRun Experiment
button on the bottom-right corner of the screen.
-
Next, your AutoAI experiment runs on its own. You see a progress map on the right side of the screen which shows which stage of the experiment is running. This may be Hyper Parameter Optimization, feature engineering, or some other stage.
-
You have different pipelines that are created, and you see the rankings of each model. Each model is ranked based on the metric that you selected. In the specific case that is the RMSE(Root mean squared error). Given that you want that number to be as small as possible, you can see that in the experiment, the model with the smallest RMSE is at the top of the leaderboard.
-
Once the experiment is done, you see
Experiment completed
under the Progress map on the right hand side of the screen.
-
Now that AutoAI has successfully generated eight different models, you can rank the models by different metrics, such as explained variance, root mean squared error, R-Squared, and mean absolute error. Each time you select a different metric, the models are re-ranked by that metric.
-
Let's pick RMSE as the experiment's metric. You see the smallest RMSE value is 4514.389, from Pipeline 8. Click on
Pipeline 8
. -
On the left-hand side, you can see different
Model Evaluation Measures
. For this particular model, you can view the metrics, such as explained variance, RMSE, and other metrics. -
On the left-hand side, you can also see
Feature Transformations
, andFeature Importance
. -
On the left-hand side, click on
Feature Importance
. You can see here that the most important predictor of the insurance premium is whether you are asmoker
ornot-smoker
. This is by far the most important feature, withbmi
coming in as the second most important. This makes sense, given that many companies offer discounts for employees who do not smoke.
-
Once you are ready to deploy one of the models, click on
Save As
at the top-right corner of the model you want to deploy. Save it as aModel
. You show you how to save it as a notebook in step 6. -
Name your model as you want, one may name it
Insurance Premium Predictor - Pattern Demo
. -
Once you have finished saving it as a deployment, you see a green notification at the top right of your screen saying that your model has been successfully saved. Click on
View in Project
on that notification at the top-right corner of your screen. -
Next, you are taken to a screen that has the name of the model you just saved. Click on
Deployments
from the Tab in the middle of the screen. -
Next, click on the
Add Deployment
button on the right-side of the screen. Name your deployment as you want. One may name itdemo-deployment
and then clickSave
. -
On your saved model overview page, you should see your new deployment
demo-deployment
being initialized.
-
Click on
demo-deployment
or whatever you named your deployment. -
It takes a few minutes for the deployment to be complete. Once it is complete - you see that a
Test
tab appears in the top of the screen. Click on theTest
tab. -
Here you can test your model. Enter input data such as
age
,bmi
,children
,smoker
andregion
, and then click thePredict
button at the bottom of the screen. -
As you can see, the model predicted a premium of 4655, when you enter age 27, bmi: 22, children: 0, smoker: no, region: southwest.
-
To validate the prediction, you check the data file that you used to train the model, and see a row that has similar inputs to what was inputted. You can find a male, 26 year old, with 0 children, non-smoker to get a premium of 3,900. This is relatively close to the model's prediction, so we know the model is working properly.
If you want to run the notebook that you explore below, go to [`https://github.com/IBM/predict-insurance-charges-with-autoai/blob/master/notebooks/Insurance%20Premium%20Predictor%20-%20P8%20notebook.ipynb).
With AutoAI's latest features, the code that is run to create these models is no more a black box. One or more of these models can be saved as a Jupyter notebook and the Python code can be run and enhanced from within.
-
Click on
Save As
at the top-right corner of the model, and clickNotebook
. -
This opens a new tab (be sure to enable pop-up for this website) titled
New Notebook
where in you can edit the default name if you choose to and then click onCreate
. This might take a few minutes to load for the first time.
- Alternatively, you can also create the notebook from the
Pipeline leaderboard
view (shown above) by clicking on theSave as
option against the model you want to save followed by selectingNotebook
. The steps are very similar to the first method discussed above.
- Once the notebook has been created, it is listed under the
Notebooks
section within theAssets
tab. - Clicking on the notebook from the list opens the Jupyter notebook where the code in Python is available.
- If the notebook is locked, click on the pencil icon on the right tab to be able to run/edit the notebook.
- Select
Cell
option from the menu list and clickRun All
. This begins executing all steps in a sequence. Unless an error is encountered, the entire notebook content is executed.
While understanding the content within the notebook requires prior knowledge of machine learning using python, we encourage you to browse through this tutorial to learn the basics of how regression models are built in python.
In this step, you do a high-level analyses of the notebook that is generated.
-
AutoAI uses sckikit-learn for creating machine learning models and for executing the steps in pipelines.
-
autoai-lib is used to transform data while being processed in the pipeline.
-
Following snippet highlights sample code of how auto-ai is used in transforming numerical data and how scikit-learn is used in setting these transformations in a pipeline.
-
Here you see the Python code that went into setting up Random Forest as the algorithm of choice for regression.
-
Calling the fit method on the pipeline, returns an estimator which is then used to predict a value. The code below shows each of these steps.
-
Finally, the Python code that was generated to validate the results and analyse the model performance is seen below. KFold-cross validation techniques have been applied to evaluate the model. The notebook can also be edited to apply other validation techniques and can be re-evaluated.
More information on the implementation considerations of AutoAI can be found here
The driver code to run the application can be found under the web-app folder within the git repository that was cloned from Step 1. To run and test your deployed model through this Python-based user-interface,
you need to replace the following information within web-app/app.py
:
- Your Watson Machine Learning (which is associated with this deployed model)
Instance ID
andapikey
. - Your deployed model's deployment URL, so you can make a POST request.
- Your IBM Cloud IAM token, to authorize yourself.
Now, you go into detail on how to gather these credentials. If you already know how to do this, you can skip the steps below, and go straight to running the application.
- Generate an IBM Cloud apikey by going to
cloud.ibm.com
and then from the top-right part of the screen click onManage
->IAM
.
- Next, click on
API keys
from the left side-bar. Next click onCreate an IBM Cloud API key
.
- Name the key as you wish, and then click
Create
.
- Once the key is created, click on the
Download
button.
-
From inside Watson Studio (Or Cloud Pak for Data), click on
Deployment Spaces
. -
From there, click on the name of the deployment in which you deployed your model to.
-
Next, click on on the name of the model.
-
Next, click on the deployment of the model.
-
From there, you will be taken to the deployment API reference page - on the right hand side you can see the
Deployment ID
. Go ahead and copy that and keep it handy - you will need to paste that into yourapp.py
page.
-
From the command line, type
curl -V
to verify if cURL is installed in your system. If cURL is not installed, refer to this instructions to get it installed. -
Execute the following cURL command to generate your access token, but replace the apikey with the apikey you got from step 7.1 above.
curl -X POST 'https://iam.cloud.ibm.com/oidc/token' -H 'Content-Type: application/x-www-form-urlencoded' -d 'grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey=<api-key-goes-here>'
As shown in the image below, the apikey can be copy and pasted from the downloaded file from the end of step 7.1. The curl request would look something like this after the apikey is pasted in:
curl -X POST 'https://iam.cloud.ibm.com/oidc/token' -H 'Content-Type: application/x-www-form-urlencoded' -d 'grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey=aSULp7nFTJl-jGx*******aQXfA6dxMlpuQ9QsOW'
-
Install python.org Windows distro 3.8.3 from http://python.org - make sure to add the /python38/scripts folder path to the $PATH environment, if you do not, you will get errors trying to run flask (flask.exe is installed to the scripts folder)
-
Remove powershell alias for curl and install curl from python3.8
PS C:/> remove-item alias:curl
PS C:/> pip3 install curl
-
- Execute curl to get secure token from IBM IAM. Please note that the token expires after 60 minutes. If you get an internal server error from the main query page (The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application), it may be due to the token expiring. Also note that in powershell the continuation character is ‘
curl -X POST 'https://iam.cloud.ibm.com/oidc/token' -H 'Content-Type: application/x-www-form-urlencoded' -d 'grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey=<apikey>'
- Copy and paste the access token into the header in the
web-app/app.py
file. Replace the line" TODO: ADD YOUR IAM ACCESS TOKEN FROM IBM CLOUD HERE"
with your token.
- Modify the
app.py
file within theweb-app
directory to change the POST request with your deployment ID. The finished line should look like the following:
response_scoring = requests.post("https://us-south.ml.cloud.ibm.com/ml/v4/deployments/18c7f626-04d2-4d1e-9b9b-bf2e6/predictions?version=2020-09-01", json=payload_scoring, headers=header)
- Once you've updated the token and the deployment id, your code should look similar to this. If it does, save it!
- Great job! You are ready to run the application!
Note, this app is tested on this version of Python 3.8.2
Within the web-app
directory, run the following command:
pip3 install flask flask-wtf urllib3 requests
Next, run the following command to start the flask application.
flask run
- Install flask and dependencies
PS C:/> pip3 install flask flask-wtf urllib3 requests
Verify modules have been installed in the 'python38/scripts' folder
- Run 'web-ap/app.py' from the local directory using flask
PS C:/> set FLASK_APP=app.py
PS C:/> flask run
-
Go to
127.0.0.1:5000
in your browser to view the application. Go ahead and fill in the form, and click on thePredict
button to see your predicted charges based on your data. -
As is expected, if you are a smoker, this drastically increase the insurance charges.
-
You can add a Dashboard which is a lean version of Cognos Dashboard available on IBM cloud from "Add to Project" option in your Watson Studio project.
-
You can start finding patterns in your data by easily visualizing various data points. This can get your exploration started within few minutes and with no coding involved
-
From visualizing this data you can see the relation in the data points, how Gender, BMI, # of children and smoking might influence the insurance premium.
-
Dashboards are very interactive and makes it easy to play with data.
-
You can also pivot and summarize your measures to quickly look at all your measures
-
Stop working in Silos and share your findings with your team in two clicks.
- Fraud Prediction Using AutoAI
- Use AutoAI to predict Customer Churn tutorial
- Predict Loan Default with AutoAI tutorial
This code pattern is licensed under the Apache Software License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.