Baseline Drift Detection Monitor Example

A CHANGE BY BK This repo is an example Spark data drift monitor model that is conformed for use with ModelOp Center and the ModelOp Spark Runtime Service.

Assets

Below are the assets that are used to run this example:

Asset Type	Repo File	HDFS Path	Description
Model Schema	`df_sample_scored_input_schema.avsc`	n/a (copied via Spark runtime service)	Input schema for the model. Copied from the base model input schema.
Input Asset (Baseline/Training Data)	`df_baseline_scored.csv`	`/hadoop/demo/df_baseline_scored.csv`	An attachment on the base model snapshot. Input file for the model `metrics()` function. The HDFS path can vary based on the `external_inputs` param of the `metrics()` function
Input Asset (Comparator Data)	`df_sample_scored.csv`	`/hadoop/demo/df_sample_scored.csv`	An associated asset. Input file for the model `metrics()` function. The HDFS path can vary based on the `external_inputs` param of the `metrics()` function
Output Asset	n/a	`/hadoop/demo/drift_analysis.csv`	Output from the model `metrics()` function that the MLC consumes to display on the ModelOp Center UI. The HDFS path can vary based on the `OUTPUT_FILE` `fileUrl` that is passed to the MLC when fired by API.

Testing the Model

Verify that your ModelOp Center instance has at least one Spark runtime service connected to an existing Spark cluster
Verify that the Spark runtime service has the tag test
Import this model into ModelOp Center via GitHub link
Create a snapshot of this model. Select "Spark" for the runtime type and (optionally) select a Spark runtime service
Import the base model modelop/titanic-spark
Add the training data asset as a URL HDFS attachment to the base model.
- URL: hdfs:///hadoop/demo/df_baseline_scored.csv
Create a snapshot of the base model
- (Optional) Select the ModelOp Runtime as the runtime type for the base model
- Add the snapshot of the associated model (this repo) as an associated model
- Set the associated model type to "Data Drift Model"
- Add the comparator data as an associated asset HDFS URL
  - URL: hdfs:///hadoop/demo/df_sample_scored.csv
- (Optional) Provide a DMN file for the MLC to parse with the test results
Record the UUID of the base model snapshot

Create a drift detection job by firing the CronTriggeredDataDriftTest MLC using the URL and JSON payload below:

URL: http://mlc.mocaasin.modelop.center/rest/signal

JSON payload (note that MODEL_ID.value must be updated):

{
    "name": "com.modelop.mlc.definitions.Signals_MODEL_DATA_DRIFT_TEST",
    "variables": {
        "MODEL_ID": {
            "value": "5a31b3c6-f033-4948-b433-bae6656baa26"
        },
        "OUTPUT_FILE": {
            "value": {
                "assetType": "EXTERNAL_FILE",
                "assetRole": "UNKNOWN",
                "fileUrl": "hdfs:///hadoop/demo/drift_analysis.csv",
                "filename": "drift_analysis.csv",
                "fileFormat": "CSV",
                "repositoryInfo": {
                    "repositoryType": "HDFS_REPOSITORY",
                    "host": "",
                    "port": 0
                }
            }
        }
    }
}

Wait for the created job to enter a COMPLETE state

Navigate to the base model snapshot's test results and verify that the test results look similar:

{
  "number_existing_credits": 0.1661573547955825,
  "number_people_liable": 0.15643137378956762,
  "score": 0.1374666427619285,
  "label_value": 0.12909200180788885,
  "present_residence_since": 0.09589797018577487,
  "installment_rate": 0.0923431071984353,
  "purpose": 0.08901857345091589,
  "credit_amount": 0.06579786015199766,
  "age_years": 0.062297942889786954,
  "present_employment_since": 0.06092273471356515,
  "duration_months": 0.05571648405502104,
  "savings_account": 0.04709760284899042,
  "credit_history": 0.03574135542427669,
  "property": 0.03477985804578699,
  "telephone": 0.026242637204975147,
  "job": 0.024366893631309054,
  "foreign_worker": 0.01806487817678789,
  "checking_status": 0.01600519274897279,
  "installment_plans": 0.015781433298469323,
  "housing": 0.010258094196074643,
  "debtors_guarantors": 0.004689237456452252
}

Manual Tests

The job_submit folder includes files that can send a job to MOC directly. you can update the global environment variables in job_submit/submit.py to point to your instance and run different types of jobs:

BASE_URL = "http://my.modelop.center.instance"
MODEL_FILENAME = "monitor.py"
JOB_FILENAME = "job_submitter/german_credit_data_drift_job.json"

Then run submit.py to submit the job:

python3 job_submitter/submit.py

betoca/baseline_drift_detection

Baseline Drift Detection Monitor Example

Assets

Testing the Model

Manual Tests