/programming-language-classifier

Classify programming languages with Watson Studio and Natural Language Classifier

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Programming Language Classification with IBM Watson Studio, Watson, and GitHub

In this Code Pattern, we will use Jupyter Notebooks in IBM Watson Studio to build a model that predicts a code's programming language based on its text. The model will then be evaluated using IBM's Watson Natural Language classifier.

When the reader has completed this Code Pattern, they will understand how to:

  • Build a labeled data set.
  • Use Watson Natural Language Classifier to create a predictive model.
  • Build a predictive model within a Jupyter Notebook.
  • Configure and use Watson APIs.

Flow

arch

  1. The developer creates an IBM Watson Studio Workspace.
  2. Using Watson Studio, the developer creates a Jupyter notebook and Watson Natural Language Classifier instance.
  3. User can create a new dataset from Github, or use exsiting one in this repo.
  4. User interacts with notebook to Build Naive Bayes Classifier and Natural Language Classifier instance using the Watson Developer Cloud SDK
  5. The notebook Python code can use NLC apis to create and use a classifier.

Included components

  • Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
  • Jupyter Notebook: An open source web application that allows you to create and share documents that contain live code, equations, visualizations, and explanatory text.
  • Watson Natural Language Classifier: Understand the intent behind text passages though custom classifiers, complete with a confidence score.

Steps

  1. Create IBM Cloud services
  2. Create a project and add services
  3. Create a notebook in Watson Studio
  4. Run the notebook in Watson Studio
  5. Add or change data set

1. Create IBM Cloud services

Create the following service:

2. Create a project and add services

  • Log into IBM's Watson Studio. Once in, you'll land on the dashboard.

  • Create a new project by clicking + New project and choosing Data Science:

    studio project

  • Enter a name for the project name and click Create.

  • NOTE: By creating a project in Watson Studio a free tier Object Storage service and Watson Machine Learning service will be created in your IBM Cloud account. Select the Free storage type to avoid fees.

    studio-new-project

  • Upon a successful project creation, you are taken to a dashboard view of your project. Take note of the Assets and Settings tabs, we'll be using them to associate our project with any external assets (datasets and notebooks) and any IBM cloud services.

    studio-project-dashboard

  • Associate the project with the previously created Natural Language Classifier service. Go to Settings tab in the new Project and scroll down to Associated Services. Click + and select Watson from the drop-down menu. Select an existing Watson Natural Language Classifier service or create a new one for free.

  • Once your Natural Language Classifier (NLC) service is created, copy the credentials and save them for later, when you will use them in your Jupyter notebook.

3. Create a notebook in Watson Studio

  • From the new project Overview panel, click + Add to project on the top right and choose the Notebook asset type.

    studio-project-dashboard

  • Fill in the following information:

    add notebook

  • Click the Create button.

  • TIP: Once successfully imported, the notebook should appear in the Notebooks section of the Assets tab.

4. Run the notebook in Watson Studio

When a notebook is executed, what is actually happening is that each code cell in the notebook is executed, in order, from top to bottom.

Each code cell is selectable and is preceded by a tag in the left margin. The tag format is In [x]:. Depending on the state of the notebook, the x can be:

  • A blank, this indicates that the cell has never been executed.

  • A number, this number represents the relative order this code step was executed.

  • A *, this indicates that the cell is currently executing.

  • Click the (►) Run button to start stepping through the notebook.

  • When you get to the cell titled 3.0 Create Classifier with Watson NLC and Evaluate Classification Accuracy, insert the username and password that you saved from your Watson Natural Language Classifier instance into the code before running it.

  • When you get to the cell that says 3.2 Add Classifier ID, Add the classifier_id that is in the output after running 3.1 Create Classifier.

  • Continue running each cell until you finish the entire notebook.

5. Add or change data set

  • The data used was generated using tools/getdata.ipynb. To use your own or another github repository for analysis, use the getdata.ipynb notebook and export the data via HTTP. Point to it in notebooks/buildmodels.ipynb section 1.0 using wget.download().

Sample output

To see the notebook with sample output, load examples/exampleNotebook.ipynb.

output

Learn more

  • Artificial Intelligence Code Patterns: Enjoyed this Code Pattern? Check out our other AI Code Patterns.
  • Data Analytics Code Patterns: Enjoyed this Code Pattern? Check out our other Data Analytics Code Patterns
  • AI and Data Code Pattern Playlist: Bookmark our playlist with all of our Code Pattern videos

License

This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.

Apache Software License (ASL) FAQ