In this Code Pattern, we will use Jupyter Notebooks in IBM Watson Studio to build a model that predicts a code's programming language based on its text. The model will then be evaluated using IBM's Watson Natural Language classifier.
When the reader has completed this Code Pattern, they will understand how to:
- Build a labeled data set.
- Use Watson Natural Language Classifier to create a predictive model.
- Build a predictive model within a Jupyter Notebook.
- Configure and use Watson APIs.
- The developer creates an IBM Watson Studio Workspace.
- Using Watson Studio, the developer creates a Jupyter notebook and Watson Natural Language Classifier instance.
- User can create a new dataset from Github, or use exsiting one in this repo.
- User interacts with notebook to Build Naive Bayes Classifier and Natural Language Classifier instance using the Watson Developer Cloud SDK
- The notebook Python code can use NLC apis to create and use a classifier.
- Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
- Jupyter Notebook: An open source web application that allows you to create and share documents that contain live code, equations, visualizations, and explanatory text.
- Watson Natural Language Classifier: Understand the intent behind text passages though custom classifiers, complete with a confidence score.
- Create IBM Cloud services
- Create a project and add services
- Create a notebook in Watson Studio
- Run the notebook in Watson Studio
- Add or change data set
Create the following service:
-
Log into IBM's Watson Studio. Once in, you'll land on the dashboard.
-
Create a new project by clicking
+ New project
and choosingData Science
: -
Enter a name for the project name and click
Create
. -
NOTE: By creating a project in Watson Studio a free tier
Object Storage
service andWatson Machine Learning
service will be created in your IBM Cloud account. Select theFree
storage type to avoid fees. -
Upon a successful project creation, you are taken to a dashboard view of your project. Take note of the
Assets
andSettings
tabs, we'll be using them to associate our project with any external assets (datasets and notebooks) and any IBM cloud services. -
Associate the project with the previously created Natural Language Classifier service. Go to
Settings
tab in the new Project and scroll down toAssociated Services
. Click + and selectWatson
from the drop-down menu. Select an existingWatson Natural Language Classifier
service or create a new one for free. -
Once your
Natural Language Classifier
(NLC) service is created, copy the credentials and save them for later, when you will use them in your Jupyter notebook.
-
From the new project
Overview
panel, click+ Add to project
on the top right and choose theNotebook
asset type. -
Fill in the following information:
- Select the
From URL
tab. [1] - Enter a
Name
for the notebook and optionally a description. [2] - Under
Notebook URL
provide the following url: https://raw.githubusercontent.com/IBM/programming-language-classifier/master/notebooks/buildmodels.ipynb [3] - For
Runtime
select thePython 3.5
option. [4]
- Select the
-
Click the
Create
button. -
TIP: Once successfully imported, the notebook should appear in the
Notebooks
section of theAssets
tab.
When a notebook is executed, what is actually happening is that each code cell in the notebook is executed, in order, from top to bottom.
Each code cell is selectable and is preceded by a tag in the left margin. The tag
format is In [x]:
. Depending on the state of the notebook, the x
can be:
-
A blank, this indicates that the cell has never been executed.
-
A number, this number represents the relative order this code step was executed.
-
A
*
, this indicates that the cell is currently executing. -
Click the
(►) Run
button to start stepping through the notebook. -
When you get to the cell titled
3.0 Create Classifier with Watson NLC and Evaluate Classification Accuracy
, insert the username and password that you saved from your Watson Natural Language Classifier instance into the code before running it. -
When you get to the cell that says
3.2 Add Classifier ID
, Add theclassifier_id
that is in the output after running3.1 Create Classifier
. -
Continue running each cell until you finish the entire notebook.
- The data used was generated using
tools/getdata.ipynb
. To use your own or another github repository for analysis, use thegetdata.ipynb
notebook and export the data via HTTP. Point to it innotebooks/buildmodels.ipynb
section 1.0 usingwget.download()
.
To see the notebook with sample output, load examples/exampleNotebook.ipynb
.
- Artificial Intelligence Code Patterns: Enjoyed this Code Pattern? Check out our other AI Code Patterns.
- Data Analytics Code Patterns: Enjoyed this Code Pattern? Check out our other Data Analytics Code Patterns
- AI and Data Code Pattern Playlist: Bookmark our playlist with all of our Code Pattern videos
This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.