Best Prompt

Question

Best Prompt

egeozsoy opened this issue 2 years ago · 4 comments

How did you decide the best prompt (https://github.com/yangyangyang127/PointCLIP_V2/blob/main/zeroshot_cls/trainers/best_param.py) from the many prompts listed here (https://github.com/yangyangyang127/PointCLIP_V2/blob/main/zeroshot_cls/prompts/modelnet40_1000.json)?

Is this chosen on the test set?

Answer 1 · 2023-03-30T15:33:41.000Z

We adopt a post-search over the test set to find this prompt, similar to the hyperparameter-tuning operations of all previous 3D networks. Because:

There is no division between test and validation sets for ModelNet40, ScanObjectNN, and ShapeNetPart.
Previous 3D networks directly use test sets to tune hyperparameters and choose the model with the best performance.
Actually, if we split a validation set, we can obtain similar results.

We follow this for a fair comparison.

Answer 2 · 2023-04-13T02:36:31.000Z

Hi，I have similar doubts. If you choose the best prompts on the test set or validation set, isn't it equivalent to using labels indirectly? Strictly speaking, zero-shot settings should not use any sample labels.

Answer 3 · 2023-04-13T05:31:14.000Z

Hi，I have similar doubts. If you choose the best prompts on the test set or validation set, isn't it equivalent to using labels indirectly? Strictly speaking, zero-shot settings should not use any sample labels.

strictly speaking, no setting should use „test“ labels

Answer 4 · 2023-04-13T13:05:28.000Z

Thanks for your valuable suggestion. @egeozsoy @zou-longkun

We follow the zero-shot setting of all open-world CLIP-based methods, e.g., CoOp, CLIP-Adapter in 2D, and CLIP2Point, ULIP in 3D. They need the validation set to tune the best hyperparameters, such as CoOp's learnable prompt length, CLIP-Adapter's architectures, or CLIP2Point's 3D prompts.
Like the popular ImageNet dataset, 3D datasets (ModelNet40, ScanObjectNN) share the validation and test set, so we tune our prompt by the test/val set just as previous ImageNet methods (ViT, MAE) or 3D methods (PointNet, PointMLP).
More specifically, CLIP itself tunes a different prompt on different downstream tasks for zero-shot classification. For example, on ImageNet, CLIP utilizes an ensemble of seven handcraft templates for the best test-set performance.

As aforementioned, every method requires a validation set for tuning to best achieve its potential, even the CLIP-based zero-shot setting that we followed. If we split a sub-validation set for prompt tuning from the training set of ModelNet40 or ScanObjectNN, the zero-shot result of PointCLIP V2 is still competitive. We will show such experiment in a few days.

Thanks again for your insightful comment that makes our work better!