covartech/PRT

Feature Request: Method for obtaining the highest/lowest confidence observations from each target class

jmmalo03 opened this issue · 1 comments

After running/training a classifier on some observations for a binary decision problem, I often like to quickly extract the most "easy" and "difficult" observations from each target class. In other words, I would like a method (or methods) that will quickly provide me with:
(1) The 'n' observations with the largest decision statistic from the positive class
(2) The 'n' observations with the lowest decision statistic from the positive class
(3) The 'n' observations with the largest decision statistic from the negative class
(4) The 'n' observations with the lowest decision statistic from the negative class

Alternatively, it would be nice to have a single method that independently sorts the observations under each target class according to their decision statistics.

As spec'd out, something to get (1)-(4), I don't think this should be a method of prtDataSetClass, and it probably shouldn't be a method of prtClass or prtAction.

It shouldn't be a method of prtDataSetClass because it makes some assumptions - e.g., that you have only one feature. That you have a "positive" and "negative" class, etc.

If you have those circumstances, there's at least one quick ways to do this:

%Example, sort into H0 and H1, sorted by yOut confidence:
yOut = classifier.run(ds);
[sorted,inds] = sort(yOut.X);
dsSort = ds.retainObservations(inds); %sort the dataSet
dsSort0 = dsSort .retainClasses(0);
dsSort1 = dsSort .retainClasses(1);

Now, the first N of dsSort0 are the easy H0, the last N are hard, and vice-versa for dsSort1.

One way to put some of these together might be: "sortBy":
e.g.
ds = ds.sortBy(sortVector,'withinClass',true);

So, you cold do:
yOut = classifier.run(ds);
ds = ds.sortBy(yOut.X,'withinClass',true);

But that doesn't actually save a whole ton of code...?

For now I don't see a super good reason to make a method that does the code in the example above...