/M3DM

Primary LanguagePythonMIT LicenseMIT

Multimodal Industrial Anomaly Detection via Hybrid Fusion (CVPR 2023)

Abstract

2D-based Industrial Anomaly Detection has been widely discussed, however, multimodal industrial anomaly detection based on 3D point clouds and RGB images still has many untouched fields. Existing multimodal industrial anomaly detection methods directly concatenate the multimodal features, which leads to a strong disturbance between features and harms the detection performance. In this paper, we propose Multi-3D-Memory (M3DM), a novel multimodal anomaly detection method with hybrid fusion scheme: firstly, we design an unsupervised feature fusion with patch-wise contrastive learning to encourage the interaction of different modal features; secondly, we use a decision layer fusion with multiple memory banks to avoid loss of information and additional novelty classifiers to make the final decision. We further propose a point feature alignment operation to better align the point cloud and RGB features. Extensive experiments show that our multimodal industrial anomaly detection model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTec-3D AD dataset.

piplien

  • The pipeline of Multi-3D-Memory (M3DM). Our M3DM contains three important parts: (1) Point Feature Alignment (PFA) converts Point Group features to plane features with interpolation and project operation, $\text{FPS}$ is the farthest point sampling and $\mathcal{F_{pt}}$ is a pretrained Point Transformer; (2) Unsupervised Feature Fusion (UFF) fuses point feature and image feature together with a patch-wise contrastive loss $\mathcal{L_{con}}$, where $\mathcal{F_{rgb}}$ is a Vision Transformer, $\chi_{rgb},\chi_{pt}$ are MLP layers and $\sigma_r, \sigma_p$ are single fully connected layers; (3) Decision Layer Fusion (DLF) combines multimodal information with multiple memory banks and makes the final decision with 2 learnable modules $\mathcal D_a, \mathcal{D_s}$ for anomaly detection and segmentation, where $\mathcal{M_{rgb}}$, $\mathcal{M_{fs}}$, $\mathcal{M_{pt}}$ are memory banks, $\phi, \psi$ are score function for single memory bank detection and segmentation, and $\mathcal{P}$ is the memory bank building algorithm.

Setup

We implement this repo with the following environment:

  • Ubuntu 18.04
  • Python 3.8
  • Pytorch 1.9.0
  • CUDA 11.3

Install the other package via:

pip install -r requirement.txt
# install knn_cuda
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl
# install pointnet2_ops_lib
pip install "git+git://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"

Data Download and Preprocess

Dataset

After download, put the dataset in dataset folder.

Datapreprocess

To run the preprocessing

python utils/preprocessing.py datasets/mvtec3d/

It may take a few hours to run the preprocessing.

Checkpoints

The following table lists the pretrain model used in M3DM:

Backbone Pretrain Method
Point Transformer Point-MAE
Point Transformer Point-Bert
ViT-b/8 DINO
ViT-b/8 Supervised ImageNet 1K
ViT-b/8 Supervised ImageNet 21K
ViT-s/8 DINO
UFF UFF Module

Put the checkpoint files in checkpoints folder.

Train and Test

Train and test the double lib version and save the feature for UFF training:

mkdir -p datasets/patch_lib
python3 main.py \
--method_name DINO+Point_MAE \
--memory_bank multiple \
--rgb_backbone_name vit_base_patch8_224_dino \
--xyz_backbone_name Point_MAE \
--save_feature \

Train the UFF:

OMP_NUM_THREADS=1 python3 -m torch.distributed.launch --nproc_per_node=1 fusion_pretrain.py    \
--accum_iter 16 \
--lr 0.003 \
--batch_size 16 \
--data_path datasets/patch_lib \
--output_dir checkpoints \

Train and test the full setting with the following command:

python3 main.py \
--method_name DINO+Point_MAE+Fusion \
--use_uff \
--memory_bank multiple \
--rgb_backbone_name vit_base_patch8_224_dino \
--xyz_backbone_name Point_MAE \
--fusion_module_path checkpoints/{FUSION_CHECKPOINT}.pth \

Note: if you set --method_name DINO or --method_name Point_MAE, set --memory_bank single at the same time.

If you find this repository useful for your research, please use the following.

@inproceedings{wang2023multimodal,
  title={Multimodal Industrial Anomaly Detection via Hybrid Fusion},
  author={Wang, Yue and Peng, Jinlong and Zhang, Jiangning and Yi, Ran and Wang, Yabiao and Wang, Chengjie},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={8032--8041},
  year={2023}
}

Thanks

Our repo is built on 3D-ADS and MoCo-v3, thanks their extraordinary works!