Predictive Modeling Using Skafos

Introduction

The purpose of this example is to highlight the utility of Skafos, Metis Machine's data science operationalization and delivery platform. In this example, we will:

Build and train a model predicting cell phone churn with data on a public S3 bucket
Save this model to a private S3 bucket
Score new customers using this model and save these scores.
Access these scores via an API and S3.

Functional Architecture + Code

The figure below provides a functional architecture for this process.

Pre-requisites

Sign up for a Skafos account
Install skafos on your machine
Authenticate your account via the skafos auth command.
A working knowledge of how to use git.

Input Data

The source data for this example is available in a public S3 bucket provided by Metis Machine. In the steps below, we will describe how to access it. No code modifications are required to access the input data.

This data has been slightly modified from its source, which is freely available and can be found here or here.

Tutorial

In the following step-by-step guide, we will walk you through how to use the code in this repository to run a job on Skafos. Following completion of this tutorial, you should be able to:

Run the existing code and access its output on S3.
Replace the provided data and model with your own data and model.

Step 1: Fork the repo

Fork the churn-model-demo from github. This code is freely available as part of the Skafos organization. Note that the README is a copy of these instructions.
Clone the forked repo to your machine, and add an upstream remote to connect to the original repo, if desired.

Step 2: Examine `metis.config.yml.example`

Each Skafos project will need its own project token and unique metis.config.yml file. The example metis.config.yml.example provided in this repo is the identical to what you will need, but the project token and job ids are tied to another Skafos account and organization.

Creating your own metis.config.yml file is simple and described below.

Step 3: Initialize your own Skafos project

Once in top level of the working directory of this project, type: skafos init on the command line. This will generate a new metis.config.yml file that is tied to your Skafos account and organization.

Open up this config file and edit the first job id to match the example .yml file included in the repo. Specifically, modify the following:

language: python
name: build-churn-model 
entrypoint: build-churn-model.py

Note: Do not edit the project token or job_ids in the .yml file. Otherwise, Skafos will not recognize and run your job.

Step 4: Add a second job to your Skafos project and `metis.config.yml`

In the example metis.config.yml file, you'll not that there are two jobs: one to build a model, and one to score new users. You will need to add a second job to your Skafos project via the following command on the command line:

skafos create job score-new-users --project <insert-your-project-token-here>

This will output a job_id on the command line. Copy this job id to your metis.config.yml file, again using the example yaml file as a template, and including the following:

language: python 
name: score-new-users
entrypoint: score-new-users.py`
dependencies: [<job-id for build-churn-model.py>]

This dependency will ensure that new users are not scored until the churn model has been built. If build-churn-model.py does not complete, then score-new-users.py will not run.

Step 5: Add `metis.config.yml` to your repo

Now that your metis.config.yml file has all the necessary components, add it to the repo, commit, and push.

Step 6: Add Skafos to the github repo

In Steps 3 and 4 above, you initialized a Skafos project so you can run the cloned repo in Skafos. Now, you will need to add the Skafos app to your github repository.

To do this, navigate to the Settings page for your organization, click on Installed GitHub Apps to add the Skafos app to this repository. Alternatively, if this repo is not part of an organization, navigate to your Settings page, click on Applications, and install the Skafos app.

Step 7: Modify the AWS Keys and Private S3 bucket

In common/data.py, the AWS information to retrieve input data and store output models and data is provided. The input S3 bucket and file names do not need to be modified; however, the location of the output models and scores will need to be updated in the code, as well as the specified keyspace.

To make these changes, do the following:

Create a private S3 bucket to save your output models and scores. This bucket will replace the existing value for S3_PRIVATE_BUCKET in the code.
You will need to provide Skafos with your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY via the command line. skafos env AWS_ACCESS_KEY_ID --set <key> and skafos env AWS_SECRET_ACCESS_KEY --set <key> will do this.
Update the KEYSPACE to be the project_token that was generated with the metis.config.yml file.

Step 8: Commit and Push All Code Changes to the github repo

In step 7, you generated several changes to common/data.py. These changes now need to be pushed to github. In doing so, the Skafos app will pick them up and run both the training and scoring jobs.

Step 9: Monitor your jobs

Navigate to dashboard.metismachine.io to monitor the status of the job you just pushed. Additional documentation about how to use the dashboard can be found here.

Step 10: Verify model and scores in S3

Once your job has completed, you can verify that the predictive model itself (in the form of a .pkl file) and the scored users (in a .csv file) are in the private S3 bucket you specified in Step 7.

Step 11: Access scores via API

In addition to data that has been output to S3, this code uses the Skafos SDK to store scored users in a Cassandra database. Specifically, the save_scores function, will write scored users to a table.

The scored users in Cassandra can be easily accessed via an API call. Navigating to the root project directory on the command line, type skafos fetch --table model_scores. This will return both a list of scores and a cURL command that can be incorporated into applications in the usual fashion to retrieve this data.

Next Steps

Now that you have successfully built a predictive model on Skafos and scored new data, you can adapt this code to build your own models.

tylerhutcherson/churn-model-demo