Best Prompt
egeozsoy opened this issue · 4 comments
How did you decide the best prompt (https://github.com/yangyangyang127/PointCLIP_V2/blob/main/zeroshot_cls/trainers/best_param.py) from the many prompts listed here (https://github.com/yangyangyang127/PointCLIP_V2/blob/main/zeroshot_cls/prompts/modelnet40_1000.json)?
Is this chosen on the test set?
We adopt a post-search over the test set to find this prompt, similar to the hyperparameter-tuning operations of all previous 3D networks. Because:
- There is no division between test and validation sets for ModelNet40, ScanObjectNN, and ShapeNetPart.
- Previous 3D networks directly use test sets to tune hyperparameters and choose the model with the best performance.
- Actually, if we split a validation set, we can obtain similar results.
We follow this for a fair comparison.
Hi,I have similar doubts. If you choose the best prompts on the test set or validation set, isn't it equivalent to using labels indirectly? Strictly speaking, zero-shot settings should not use any sample labels.
Hi,I have similar doubts. If you choose the best prompts on the test set or validation set, isn't it equivalent to using labels indirectly? Strictly speaking, zero-shot settings should not use any sample labels.
strictly speaking, no setting should use „test“ labels
Thanks for your valuable suggestion. @egeozsoy @zou-longkun
- We follow the
zero-shot setting of all open-world CLIP-based methods
, e.g., CoOp, CLIP-Adapter in 2D, and CLIP2Point, ULIP in 3D. They need the validation set to tune the best hyperparameters, such as CoOp's learnable prompt length, CLIP-Adapter's architectures, or CLIP2Point's 3D prompts. - Like the popular ImageNet dataset, 3D datasets (ModelNet40, ScanObjectNN) share the validation and test set, so we tune our prompt by the test/val set just as previous ImageNet methods (ViT, MAE) or 3D methods (PointNet, PointMLP).
- More specifically,
CLIP itself
tunes a different prompt on different downstream tasks forzero-shot classification
. For example, on ImageNet, CLIP utilizes an ensemble of seven handcraft templates for the best test-set performance.
As aforementioned, every method requires a validation set for tuning to best achieve its potential, even the CLIP-based zero-shot setting that we followed. If we split a sub-validation set for prompt tuning from the training set of ModelNet40 or ScanObjectNN, the zero-shot result of PointCLIP V2 is still competitive. We will show such experiment in a few days.
Thanks again for your insightful comment that makes our work better!