Sotera/webpageclassifier

Finish integrating ERROR category into scores

ctwardy opened this issue · 1 comments

The scikit scores seem to ignore ERROR (ie 'error') in the results. Notice the zeros in the 'error' row below, while the confusion matrix shows some action:

             precision    recall  f1-score   support

  UNCERTAIN       0.00      0.00      0.00         0
       blog       0.82      0.54      0.65        69
 classified       0.44      0.28      0.34        75
      error       0.00      0.00      0.00       240
      forum       0.77      0.80      0.78       337
       news       0.86      0.44      0.59       151
   shopping       0.52      0.70      0.60       155
       wiki       0.84      0.85      0.84        79

avg / total       0.56      0.52      0.53      1106

Confusion Matrix:
           UNCERTAIN:    0,   0,   0,   0,   0,   0,   0,   0
                blog:   15,  37,   4,   0,   3,   1,   9,   0
          classified:   23,   0,  21,   0,   0,   0,  31,   0
               error:  133,   7,   1,   0,  68,   8,  16,   7
               forum:   48,   1,   4,   0, 271,   1,  12,   0
                news:   28,   0,  10,   0,  10,  67,  30,   6
            shopping:   37,   0,   8,   0,   0,   1, 109,   0
                wiki:    7,   0,   0,   0,   2,   0,   3,  67

   µ Info: 0.39
   Total #: 1106
   #Errors:    0 	(   0 Bleached)
#Predicted: 1106
  Accuracy: 0.52

Wait, no, fixing #15 didn't fix this. Problem is it's never forecasting 'error'.

             precision    recall  f1-score   support
      error       0.00      0.00      0.00       240

Confusion Matrix:
               error:    7,   7,   8,  68,   1,  16, 133,   0   <-- Note the zero in the last column.

The 'error' entries are showing up as UNCERTAIN or as 'forum'.
So 'error' is never rising above threshold. Investigate.