An R package for computing and displaying ROC curves, DET curves, and computing detection classifier statistcs such as EER(CH), Cllr, minimum Cllr, and doing calibration. This package was formerly known as sretools
. It can be used for the analysis of any two-class classifier that output a score, for instance biometric comparison systems.
library(ROC)
data(tno.2008)
x <- tno.2008 ## get ome example data
head(x)
This loads the package and makes an example data.frame available, which are the results TNO subitted to the NIST 2008 Speaker Recognition Evaluation. The last line shows the structure of the data frame.
The most important columns are score
and target
. score
is numer that is produced by a classifier that tries to discriminate beween same-source trials target==TRUE
and different-source trials for which target==FALSE
. For any data.frame with these colums, a ROC curve can be drawn:
plot(roc(x))
The default is to plot the miss-rate, an error rate, instead of the hit-rate which some people prefer.
However, a nicer way of plotting 2-class classifier scores is in a so-called Detection Error Tradeoff plot, DET, which is the same as a ROC, but has warped axes.
det.plot(roc(x))
## or
det.plot(x)
Computing ROC statistics or plotting a ROC or DET will result in a table with basic statistics
Cllr
the cost of the log-likelihood ratio, if the score is interpred as a calibrated log-likelihood-ratioCllr.min
the cost of the log-likelihood ratio, after transforming warping the scores using a monotonic function that minimizedCllr
eer
the equal error rate, in the ROC convex hull interpretationmt
the mean of the target scoresmn
the mean of the non-target scoresnt
,nn
,n
, the number of target and non-target trials, and the total number of trials
The most important two classes are cst
(collection of supervised trials) and roc
(receiver operating characteristic).
Most functions have been desined to handle the input data pretty flexibly, most functions will directly take data.frame
s, cst
or roc
object. Most of the complex computation goes into creating a roc
object. This involves the Pool Adjacent Violators algorithm and/or convex hull computation, so for large data frames (millions of trials) it becomes more efficient to calculate the roc
object first by calling roc()
.
In summary, the analysis tools are layed out as follows:
cst.tnt()
andas.cst()
createcst
objects, which contain both the classifiers scoresscore
and the class labelstarget
.roc()
computes the basic ROC statistics: probability of miss vs probability of false alarm, with the threshold values. Furter, some overall characteristics are computed, like the equal error rate. Together witht the original data everything incompactly stored in aroc
object.roc.plot()
(the default plotting routine for an object of classroc
) anddet.plot
will plot the ROC curve, either on linear scales orprobability paper
, i.e., probit waped axes (R'sqnorm()
).llr.plot()
shows the optimal score-to-log-likelihood-ratio function. If scores were log-likelihood-ratios, a minimum Bayes error decision could always be made by just giving the prior and cost parameters.train.logreg()
andtrain.cmlg()
are two calibration functions, that compute calibration parameters for simple models of score-to-log-likelihood-ratio functions. The models can be applied uingpredict()
.ape.plot
shows an applied probability of error (APE): the Bayes optimal decision error rate versus the logit of the (effective) prior.nbe.plot
shows a normalized version of the APE: the error rates are scaled to the error rates of a minimum error Bayes decisions based on the prior.tippet.plot()
draws a Tippet plot, a way of analysing a 2-class classifier in Forensic literature.setDCF()
,actDCF()
,minDCF()
andprior.log.odds()
are functions that set Decision Cost Functions and query the actual and minimum decision costs based on a log-likelihood-ratio interpretation of the scores. A number of pre-defined DCF operating points are pre-defined, including the two-point NIST SRE 2012 cost function.
This R package was first conceived to support delaing with unbalanced conditions in classifier evaluation. If the performance of a classifier depends on some factor, e.g., the image resolution in a face recognition system, or the number of minutiae in a fingermark comparison system, it can occur that the trial counts over the various conditions are not equally balanced. If the conditions are analysed separately, there is no problem, but if all trials are mixed in order to give an overall performance characteristic, then the unbalanced trial counts give a biased performance behaviour that may vary along the ROC curve.
In order to compensate for such effects, roc()
has an option to specify a list of factors for which unbalanced trial counts can be equalized. Additionally, the main performance metrics are computed using the equalized counts. The factors can be included in the original data frame as extra columns. Thus it is easy to make subsets of the data and analyse these separately, or combine conditions and computed equal-weighted averages of performance metrics.
plot.cond()
is a function that draws multiple DET curves in a single plot, according to a list of factors, and finally draws the overall curve using trial-count equalized statistics.