Metrics for released scannet checkpoint for test is much lower than that of the paper?

Question

Metrics for released scannet checkpoint for test is much lower than that of the paper?

yxchng opened this issue 8 months ago · 3 comments

I download the scannet checkpoint and eval using the command

python tools/test.py configs/oneformer3d_1xb4_scannet.py \
    work_dirs/oneformer3d_1xb4_scannet/epoch_512.pth

+----------------+---------+---------+--------+-----------+----------+
| classes        | AP_0.25 | AP_0.50 | AP     | Prec_0.50 | Rec_0.50 |
+----------------+---------+---------+--------+-----------+----------+
| cabinet        | 0.8088  | 0.7122  | 0.4896 | 0.7542    | 0.7258   |
| bed            | 0.8378  | 0.8131  | 0.6108 | 0.9848    | 0.8025   |
| chair          | 0.9826  | 0.9584  | 0.8291 | 0.9571    | 0.9305   |
| sofa           | 0.8960  | 0.8518  | 0.5913 | 0.9146    | 0.7732   |
| table          | 0.8969  | 0.8560  | 0.6588 | 0.8814    | 0.7880   |
| door           | 0.8745  | 0.7704  | 0.5673 | 0.8491    | 0.7124   |
| window         | 0.8096  | 0.6196  | 0.4199 | 0.7511    | 0.6099   |
| bookshelf      | 0.8565  | 0.7026  | 0.3991 | 0.7176    | 0.7922   |
| picture        | 0.8570  | 0.7959  | 0.6218 | 0.8571    | 0.7385   |
| counter        | 0.8100  | 0.6985  | 0.3877 | 0.7037    | 0.7451   |
| desk           | 0.8724  | 0.7428  | 0.4567 | 0.7619    | 0.7559   |
| curtain        | 0.8346  | 0.7158  | 0.4743 | 0.9375    | 0.6716   |
| refrigerator   | 0.7564  | 0.7375  | 0.6045 | 0.9024    | 0.6491   |
| showercurtrain | 0.8418  | 0.7446  | 0.6091 | 0.6857    | 0.8571   |
| toilet         | 1.0000  | 1.0000  | 0.9423 | 1.0000    | 1.0000   |
| sink           | 0.9501  | 0.8310  | 0.6196 | 0.8791    | 0.8333   |
| bathtub        | 0.9176  | 0.9032  | 0.8190 | 1.0000    | 0.9032   |
| otherfurniture | 0.7987  | 0.7230  | 0.5724 | 0.8027    | 0.6856   |
+----------------+---------+---------+--------+-----------+----------+
| Overall        | 0.8667  | 0.7876  | 0.5930 | 0.8522    | 0.7763   |
+----------------+---------+---------+--------+-----------+----------+
03/21 04:01:15 - mmengine - INFO - Epoch(test) [312/312]    miou: 0.7643  all_ap: 0.5930  all_ap_50%: 0.7876  all_ap_25%: 0.8667  pq: 0.7074  data_time: 0.0371  time: 0.1451

v.s.

What am I doing wrong?

oneformer3d-contributor commented 8 months ago

Yes.

Answer 1 · 2024-03-21T09:11:13.000Z

This part of the table is for hidden test leaderboard, the checkpoint and the first part of the table are for validation split.

Answer 2 · 2024-03-21T09:30:24.000Z

so python tools/test.py is actually doing the same thing as validation during training?