- numpy
- scipy [optional, for scipy distance metric]
- torch, mmcv [optional, for IoU metric]
- tqdm [optional, for displaying evaluation progress]
pip install numpy scipy tqdm torch mmcv-full
To perform basic evaluation, simply instantiate a DetectionEval
instance and call it
detection_evaluator = DetectionEval( ... )
results = detection_evaluator( gts, preds )
or
results = DetectionEval.evaluate( gts, preds, ... )
The arguments are:
gts
: a list of ground truth annotations. If not in(labels, boxes)
format, thengt_processor
must be specified.preds
: a list of predictions. If not in(labels, scores, boxes)
format, thenpred_processor
must be specified.thresholds
: a list or dict that maps a label to a matching threshold.criterion
[optional]: matching criterion, can be one of
['iou', iou_2d', iou_bev', 'iou_3d', 'iof', 'iof_2d', 'iof_bev', 'iof_3d', 'euclidean', 'euclidean_3d']
Defaults toiou
.epsilon
[optional]: minimum threshold for matching. Defaults to 0.1.filters
[optional]: a list of callables that specify boxes that should be discarded or ignored. Defaults to None.n_positions
[optional]: number of recall positions to use in the AP evaluation. Defaults to 100.metrics
[optional]: a list of metrics that will be in the returned results. Defaults to ['ap'].gt_processor
[optional]: a callable that transforms each item ingts
into(labels, boxes)
format.pred_processor
[optional]: a callable that transforms each item inpreds
into(labels, scores, boxes)
format.verbose
[optional]: whether to display evaluation progress. Defaults to False.
For each specified filter, the detection evaluation consists of three steps
- For each sample/frame, box matching is performed to compute TP/FP/FN breakdowns
- The TP/FP/FN information from all samples are gathered and combined
- Statsitics (e.g. AP) are computed
To manually evaluate one sample and get the detailed TP/FP/FN breakdown, you can use
gt_list, pred_list = DetectionEval.evaluate_one_sample( gt, pred, ... )
The parameters are similar to DetectionEval.evaluate(...)
:
gt
: ground truth annotation. If not in(labels, boxes)
format, thengt_processor
must be specified.pred
: predictions. If not in(labels, scores, boxes)
format, thenpred_processor
must be specified.thresholds
: a list or dict that maps a label to a matching threshold.criterion
[optional]: matching criterion, can be one of
['iou', iou_2d', iou_bev', 'iou_3d', 'iof', 'iof_2d', 'iof_bev', 'iof_3d', 'euclidean', 'euclidean_3d']
Defaults toiou
.epsilon
[optional]: minimum threshold for matching. Defaults to 0.1.filta
[optional]: a callables that specify boxes that should be discarded or ignored. Defaults to None.ctable
[optional]: pairwise distance table between the gt and pred boxes. Defaults to None.gt_processor
[optional]: a callable that transforms each item ingts
into(labels, boxes)
format.pred_processor
[optional]: a callable that transforms each item inpreds
into(labels, scores, boxes)
format.
The return value is a tuple of two BoxList
objects: gt_list
and pred_list
.
The return value from the evaluation are two BoxList
type objects that store the necessary TP/FP/FN information.
BoxList
objects has the following attributes:
ignored
: a binary mask indicating if a box is ignored during the evaluationbg
: a binary mask indicating if a box does not match with any GT/pred box (e.g. IoU < epsilon)localized
: a binary mask indicating if a box is correctly localized (e.g. best_iou > threshold)loc_scores
: an array that records the localization scores (e.g. IoU, distance, etc.)classified
: a binary mask indicating if a box is correctly classified, ignoring localizationgt_labels
: (matched) ground truth labels for the boxespred_labels
: (matched) predicted labels for the boxespred_scores
: (matched) predicted scores for the boxesmatched_idx
: an array that stores indices of the matching GT/pred boxes (for TPs)data
: extra payload that can be attached by the user during evaluation
For each sample, the evaluation will always generate two BoxList
objects, one for GT boxes and one for predicted boxes. To correctly perform aggregation and combine matched_idx
attribute when combining BoxList
objects, the two list can (and should) be paired. This is done automatically, but to manually pair two BoxList
, do
gt_list.pair_with(pred_list)
To evaluate multiple samples, you can call DetectionEval.evaluate_one_sample
for each sample and aggregate the results.
Alternatively, DetectionEval
also provides a function that simplifies this process.
gt_list, pred_list = DetectionEval.evaluate_all_samples( ... )
The parameters are the same as DetectionEval.evaluate_one_sample
, with the following exceptions:
-
gts
: instead of a singlegt
sample,evaluate_all_samples
takes a list ofgt
samples. -
preds
: similarly, it takes a list ofpreds
samples. -
pbar
[optional]: atqdm
progress bar object to show the evaluation progress.pbar.update()
will be called after each sample evaluation. -
callback
[optional]: a callable that will be called after evaluating each sample. It can be used to attach additionaldata
to the resultBoxList
objects.The
callback
will be provided with the following positional arguments:callback(sample_idx, gt, pred, gt_list, pred_list)
# Progress bar can be tqdm, tqdm.notebook or other compatible objects
from tqdm import tqdm
pbar = tqdm(total=len(gts))
# A callback that attaches extra data to the `BoxList`
def attach_data( sample_idx, gt, pred, gt_list, pred_list ):
gt_list.data = gt['data']
pred_list.data = pred['data']
gt_list, pred_list = DetectionEval.evaluate_all_samples(
preds, gts, ..., pbar=pbar, callback=attach_data
)