When analyze coco2017,this code ap is a little higher than original ap?
Closed this issue · 12 comments
When analyze coco2017,this code ap is a little higher than original ap?
Hi @baolinhu, could you please be more specific?
It would be useful if you could post the values that you're referring to so I can double check. Also, the datasets are slightly different so I wouldn't be too surprised if that were the case.
@matteorr Thanks for reply.I use this code to analyze coco2017val.Use the same prediction result JSON file.
- This code output
- Use other local evaluation code or use an online server
Same question, 0.7 point on mAP.
@DouYishun, thanks for adding info. I assume you mean 0.007?
Right now, the only results that seem affected are AP@IoU=0.5:0.95 | area=all
(.718 instead of .711), and AP@IoU=0.75 | area=all
(.796 instead of .789). Recall is not affected.
What is surprising is that the overall AP on medium and large instances is the same, but when using area=all there is a difference. This makes me think the problem might have to do with a possible difference in the annotation of small objects (which are not considered in the COCO Keypoints, but my eval code might be looking at how many small objects there are when computing precision).
I'm currently looking into it and will post updates as soon as possible.
@matteorr Yes, I mean 0.007%.
Here is my results
AP@OKS=0.5:0.95
, AP@OKS=0.5
and AP@OKS=0.75
are affected.
As I suspected in the previous post, the culprit is the different definition of area ranges during evaluation.
When wrapping the COCOeval.evaluate()
function I pass the parameters from the COCOanalyze class
:
self.cocoEval.params.areaRng = self.params.areaRng
self.cocoEval.params.areaRngLbl = self.params.areaRngLbl
self.cocoEval.params.maxDets = self.params.maxDets
self.cocoEval.params.iouThrs = sorted(self.params.oksThrs)
These values are initialized in the Params class and my default values for the area ranges are different from the values that are defined in the original cocoeval repo.
Specifically, since COCO keypoints don't have small instances I believe that the all
area range should not be defined to include annotations with less than 32**2 pixels. That's why I defined the all
area range as [32 ** 2, 1e5 ** 2]
. Conversely, in the coco repo they define the all
area range for keypoints
as exactly the same one used for bbox
and segm
, so [0 ** 2, 1e5 ** 2]
.
The definition discrepancy results in the fact that the number of ground truth instances counted is different for the two evaluations, resulting in different AP (higher for cocoanalyze
since it considers less instances present), while obviously recall is not affected by that.
I think my solution makes more sense and reached out about it in the past, but they didn't change their code. You can easily choose the one you prefer though, by just changing the default param values in the Params class, or by overwriting them after you instantiated an object of class COCOanalyze and just accessing its params, i.e.:
coco_analyze = COCOanalyze(coco_gt, coco_dt, 'keypoints')
coco_analyze.params.areaRng = [[0 ** 2, 1e5 ** 2], [96 ** 2, 1e5 ** 2], [32 ** 2, 96 ** 2]]
After this change the results will match. I'm closing this issue, please put a thumbs up if you think its solved, or feel free to reopen if you have further comments.
@matteorr Thanks. I know your mean.But I still don't fully understand.Since COCO keypoints don't have small instances,the number of ground truth instances should be the same.Because the number of ground truth instances whose areas less than 32**2 pixels should be 0.So [0 ** 2, 1e5 ** 2]
is equivalent to [32 ** 2, 1e5 ** 2]
.I think it will affect the number of FP(False Positive samples).
@baolinhu - With the number of ground truth instances counted is different for the two evaluations
I really meant the number of ground truth instances that are matched to a detection. Setting the area range to a different value will also determine which detections to ignore.
So in this particular case, detections smaller than 32**2 will be ignored by my evaluation code. To convince yourself, try removing all the small detections before loading them in the COCOanalyze
class. I.e.:
new_team_split_dts = [d for d in team_split_dts if d['area']>32**2]
coco_gt = COCO( annFile )
coco_dt = coco_gt.loadRes( new_team_split_dts )
coco_analyze = COCOanalyze(coco_gt, coco_dt, 'keypoints')
coco_analyze.evaluate(verbose=True, makeplots=False, savedir=saveDir, team_name=teamName)
You'll see that the results in this case are exactly the same regardless of the value coco_analyze.params.areaRng[0]
being [0 ** 2, 1e5 ** 2]
or [32 ** 2, 1e5 ** 2]
.
This makes sense to me. But if you still don't agree please post back, maybe I am missing your point.
- Firstly, You solved my problem.Thanks.
- Then I will state my point of view.
new_team_split_dts = [d for d in team_split_dts if d['area']>32**2]
this code will decrease False Positive samples, like when a detection result is positive but its area less than32**2
which will be ignored. It may be a little unreasonable (you should stand in the perspective of not knowing the data set when evaluatingallRng
, you should not add this a priori information with an area larger than32**2
).
P(precision) = TP / (TP + FP)
will higher. Asrecall = TP/(TP + FN)
is not affected by that, because(TP + FN)
is the total number of ground truth instances not changed. - So I think the problem is
the number of positive detection results counted is different for the two evaluations
. Does it should be ignored when evaluatingallRng
?
@baolinhu - Glad the issue is resolved.
To follow up one last time on what might be the "best" evaluation strategy, my interpretation is that since we know that COCO does not have ground truth keypoints for instances with area smaller that 32**2 it is better to ignore detections that have area too small, as they most likely will not be good because of the lack of training data. I agree this strategy might penalize algorithms that are making good keypoint predictions for small instances.
An interesting approach could be to ignore all detections with an area smaller than the minimum area with which an IoU of 0.5 is possible if the detection is perfectly overlapped with a ground truth of size 32**2.
In conclusion, I think there is no definitive "right" or "wrong" way of doing it. As long as you are aware about what are the consequences with either approach, and compare all algorithms using the same technique it shouldn't matter too much.
Yeah,I agree with your conclusion.Thanks for your patience.
coco_analyze = COCOanalyze(coco_gt, coco_dt, 'keypoints') coco_analyze.params.areaRng = [[0 ** 2, 1e5 ** 2], [96 ** 2, 1e5 ** 2], [32 ** 2, 96 ** 2]]
After this change the results will match. I'm closing this issue, please put a thumbs up if you think its solved, or feel free to reopen if you have further comments.
Worked for me, but medium
and large
results are switched. In comparison with the linked code above it should be:
coco_analyze.params.areaRng = [[0 ** 2, 1e5 ** 2], [32 ** 2, 96 ** 2], [96 ** 2, 1e5 ** 2]]