Inquiry on Addressing Performance Issues in Zero-Shot Settings
Opened this issue · 3 comments
Thanks for your work in anomaly detection domain. I am reaching out to discuss an aspect of your work that caught my attention, specifically regarding the experiments conducted in a zero-shot setting.
My question centers around how you addressed the potential increase in anomaly scores for normal samples when transferring the model trained on one dataset (e.g., MVTec) to perform zero-shot anomaly detection on a different dataset (e.g., VisA). Commonly, such a transition might result in higher anomaly scores for normal samples in the new dataset, potentially leading to an increased false positive rate.
Could you elaborate on the strategies or methodologies employed in your work to mitigate this issue? Thank you for your time and consideration. I look forward to your insights on this issue.
Thanks for your attention.
In my opinion, only the zero-shot anomaly detector is trained on one dataset and tested on another dataset. The anomaly detectors are designed for handling zero-shot setting which train set and test set are naturally different: the anomaly scores might be increasing due to the gap between training and testing sets.
We always use the powerful pre-trained models as backbone to resolve this problem. For example, AprilGAN and WinCLIP use CLIP, a popular brilliant pre-trained V-L model, while we use MiniGPT-4 which is designed for open-set/grounding usage.
Further, Myriad can utilize the zero-shot vision experts like AprilGAN and WinCLIP to provide a prior and help the vision encoder give more attention for the potential defect area. With vision expert guidance visual features, MiniGPT-4 can figure out whether there are unusual parts in the image, thanks to the large-scale language and multimodal pre-training.