how to calculate average precision for a multilabel classification problem.
Step-by-step explanation with a simple example.
Let's consider a multilabel classification problem with 4 classes (A, B, C, D) and 5 samples (S1, S2, S3, S4, S5).
The ground truth labels for each sample are given below:
Targets
Sample
A
B
C
D
S1
1.00
0.00
1.00
0.00
S2
0.00
1.00
0.00
0.00
S3
1.00
1.00
1.00
0.00
S4
0.00
0.00
0.00
1.00
S5
1.00
1.00
0.00
0.00
Now, let's assume that we have a classifier that predicts the following probabilities for each class and each sample:
Predictions
Sample
A
B
C
D
S1
0.80
0.20
0.65
0.90
S2
0.30
0.20
0.40
0.85
S3
0.20
0.70
0.45
0.85
S4
0.10
0.30
0.70
0.95
S5
0.70
0.60
0.45
0.80
To calculate the mean average precision, we need to compute the precision-recall curve for each class, and then average the area under each curve.
Here are the steps to compute the precision-recall curve for class A:
1. Sort the samples by their predicted probability for class A in descending order:
S1: 0.800.200.650.90 (A, C)
S5: 0.700.600.450.80 (A, B)
S2: 0.300.200.400.85 (B)
S3: 0.200.700.450.85 (A, B, C)
S4: 0.100.300.700.95 (D)
2. Compute precision and recall for each threshold:
$$
P = {TP \over TP + FP} \quad\quad\quad R = {TP \over TP + FN}
$$