Yunkang Cao*, Xiaohao Xu*, Chen Sun*, Xiaonan Huang, Weiming Shen. (*These authors contribute equally.)
Anomaly detection is a crucial task across different domains and data types. However, existing anomaly detection models are often designed for specific domains and modalities. This study explores the use of GPT-4V(ision), a powerful visual-linguistic model, to address anomaly detection tasks in a generic manner. We investigate the application of GPT-4V in multi-modality, multi-domain anomaly detection tasks, including image, video, point cloud, and time series data, across multiple application areas, such as industrial, medical, logical, video, 3D anomaly detection, and localization tasks. To enhance GPT-4V's performance, we incorporate different kinds of additional cues such as class information, human expertise, and reference images as prompts. Based on our experiments, GPT-4V proves to be highly effective in detecting and explaining global and fine-grained semantic patterns in zero/one-shot anomaly detection. This enables accurate differentiation between normal and abnormal instances. Overall, GPT-4V exhibits promising performance in generic anomaly detection and understanding, thus opening up a new avenue for anomaly detection.
All cases and corresponding prompts for evalutaions can be found in ./Cases
.
The evaluations were conducted before November 7, 2023. GPT-4V has shown gradual improvement and may deliver even stronger performance in the future.
- Evaluate quantitative results
- Evaluate GPT-4V in multi-round conversations
GPT-4V demonstrates robust anomaly detection capabilities in various multi-modal and multi-field tasks. It excels in comprehending image context, discerning normal standards, and comparing provided images effectively.
-
Multi-Modality Anomaly Detection: GPT-4V handles diverse data types like images, point clouds, and X-rays, making it adaptable to multi-modal tasks, surpassing single-modal detectors.
-
Multi-Field Anomaly Detection: GPT-4V performs well in industrial, medical, pedestrian, traffic, and time series anomaly detection, showcasing its versatility across domains.
-
Zero/One-Shot Anomaly Detection: GPT-4V adapts to different inference scenarios, utilizing language prompts for anomaly detection, with or without reference images.
-
Global Semantics: GPT-4V recognizes overarching abnormal patterns or behaviors, making it suitable for identifying anomalies in broader contexts, e.g., traffic anomaly detection.
-
Fine-Grained Semantics: GPT-4V precisely localizes anomalies within complex data, enhancing its ability to detect subtle irregularities, e.g., industrial image anomaly detection.
GPT-4V automatically reasons complex normal standards and provides explanations for detected anomalies. It adds interpretability to its results, making it valuable for understanding irregularities in various domains.
Additional prompts improve GPT-4V's anomaly detection performance, responding to class information, human expertise, and reference images.
While GPT-4V shows promise, challenges exist in highly complex scenarios, such as industrial applications and ethical constraints in the medical field. Further enhancements and fine-tuning may be required to address these challenges and unlock its potential.
Task Introduction: Industrial image anomaly detection is a critical component of manufacturing processes aimed at upholding product quality. Following the establishment of the MVTec AD dataset, various methods have thrived in this field.
Task Introduction: Industrial image anomaly localization entails a more intricate process than mere image anomaly detection. Its potential for image anomaly localization warrants further exploration.
Task Introduction: Geometrical information holds a crucial role in fields like industrial anomaly detection, especially when dealing with categories lacking textual information. In contrast, CPMF proposes a novel approach by transforming point clouds into depth images.
Task Introduction: In addition to structural anomalies, there exists another type of anomaly, named logical anomalies. Existing logical anomaly detection methods typically relied on solely visual context.
Task Introduction: Anomaly detection in medical imaging is pivotal for early diagnosis and effective treatment planning. GPT-4V shows promising future in enhancing the performance of anomaly detection tasks in various medical imaging modalities.
![Medical Image Anomaly Detection](./images/Chest X-Ray Case_1.png) ![Medical Image Anomaly Detection](./images/Chest X-Ray Case_2.png)
Task Introduction: Anomaly localization is imperative for clinicians to understand the extent and nature of the pathology.
![Medical Image Anomaly Localization](./images/Endoscopy Localization Case.png)
Task Introduction: Pedestrian anomaly detection is dedicated to recognizing irregular activities within pedestrian interactions captured in video streams.
Task Introduction: Traffic anomaly detection aims at identifying the commencement and conclusion of abnormal events in traffic scenarios.
Task Introduction: Time series anomaly detection refers to the task of identifying unusual or abnormal patterns in sequential data over time.
For more details, please refer to the document.
If you found this study useful in your research or applications, please kindly cite using the following BibTeX:
@article{cao2023genericad,
title={Towards Generic Anomaly Detection and Understanding: Large-scale Visual-linguistic Model (GPT-4V) Takes the Lead},
author={Yunkang Cao and Xiaohao Xu and Chen Sun and Xiaonan Huang and Weiming Shen},
journal={arXiv preprint arXiv:2311.02782},
year={2023}
}
Our study is largely inspired by GPT-4V, GPT-4V for Medical, SoM, CPMF. Thanks for their wonderful work!