dssg/police-eis

Comparative model evaluation

jtwalsh0 opened this issue · 2 comments

Our model evaluations currently look at models independently. We should also compare models. Here are some of the things to look at:

  1. Correlation matrices between model predictions
    • Jaccard similarity and rank-order correlations
  2. Webapp should show display model accuracy for simple comparison, e.g. sort by accuracy
  3. Cluster models
  4. Predict model performance from model characteristics/configurations, e.g. type of model (random forest, logistic regression) is a feature, size of the time window is a feature, time period is a feature, hyperparameters are features, etc. That can help uncover patterns

The webapp should show how stable/unstable model performance is over time.

Within-model evaluation:

  • This is an absolute measure. Plot precision/recall/ROC AUC/etc over time

Between-model evaluation:

  • plot rank-order correlation from one period to the next (i.e. do the same models consistently appear at the top?), maybe Jaccard similarity for models in top k models

This is part of Tyra now