/ml-lab-industrial-code

Experimental project to play with Machine Learning and python

Primary LanguageJupyter Notebook

ML Lab Industrial Code

A simple project to experiment with exposing an ML-model as a REST API.

Case

Determine (classify) a given organization's industrial code based on 'formaal' as given by user.

Machine learning theory

The task at hand is considered a task of text classification (aka categorization). As such we are going to represent the formaal as a feature vector X. We must consider whether to apply a process of feature selection to speed up the classification.

What type of machine learning system are we looking at? In the following we will try to describe our system along three categories:

Supervised vs unsupervised learning

Supervised learning algorithms

For an extensive list of supervised learning algorithms supported by scikit-learn, check https://scikit-learn.org/stable/supervised_learning.html

Unsupervised learning algorithms

  • Describe our choice

Batch vs online learning

  • Describe our choice

Instance-based vs model-based training

The final category adresses how a machine learning system generalize. There are to main approaches to generalization:

  • Instance-based learning: the system learns examples by heart, then generalize to new cases using a similarity measure, vs
  • Model-based learning: to build a model of the examples, then the system use that model to make predictions.
  • Describe our choice

Workflow

  • Batch train a model on data
  • Expose model as API
  • Monitor and gather metrics
  • Evaluate performance
  • Update model

Solution architecture

Our ML pipeline looks like this: architecture credits: Emily Fox & Carlos Guestrin

Solution

The solution is a simple python project that implements a script for training a model as well as a server that exposes the api.

Project structure

-- helloMLAPI
  -- data                 # contains the data
    -- organizations.csv
  -- models               # contains the models trained on the data
    -- xyz.py
    -- xyz.pckl           # a persisted model
  -- api
    -- server.py          # the api
    -- Dockerfile
  -- test
    -- test.py
  README.md

Requirements

Deploy

To start the server locally:

cd api
python server.py

Docker

TODO: Describe how to build and start the api as a dockerized service

Usage locally

Once the server is running, you can use it with e.g. curl:

curl \
  --include \
  --header "Content-Type: application/json"  \
  --request POST \
  --data '{"formaal":"Turer i skog og mark"}' \
  --url http://localhost:5000 \
  --write-out "\n"

Response should include a list of industrial codes and descriptions that matches the formaal. The list will be sorted best match first.

Credits