Incomplete Multimodal Industrial Anomaly Detection via Cross-Modal Distillation

This repository is the official implementation of Incomplete Multimodal Industrial Anomaly Detection via Cross-Modal Distillation.

Visualization of Some Prediction Results

Requirements

We implement this repo with the following environment:

Ubuntu 22.04
CUDA 12.1
Python 3.11
Pytorch 2.2.0

To install requirements:

# Please install Pytorch first before other packages

# Install KNN_CUDA
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl
# Install Pointnet2_PyTorch(pointnet2_ops)
git clone https://github.com/erikwijmans/Pointnet2_PyTorch.git
cd Pointnet2_PyTorch
pip install -r requirements.txt
# You may encounter compilation issues for Pointnet2_PyTorch (see attached note). 

# Now you can go back and install other packages for CMDIAD :)
pip install -r requirements.txt

📋 Sometimes conda's version control will cause the installation failure. We recommend using venv or conda to create a virtual environment and then use pip to install all packages. If you encountered compilation issues for Pointnet2_PyTorch, please modify pointnet2_ops_lib/setup.py with my attempts Pull request

Dataset and Pre-trained Models

Dataset

The MVTec 3D-AD dataset can be downloaded from MVTec3D-AD. It should be unzipped and placed under the datasets folder.

Data Pre-processing

python utils/preprocessing.py --dataset_path datasets/mvtec_3d/

📋 It is recommended to use the default value for the path to the dataset to prevent problems in subsequent training and evaluation, but you can change the number of threads used according to your configuration. Please note that the pre-processing is performed in place.

Checkpoints

Purpose	Checkpoint
Point Clouds (PCs) feature extractor	Point-MAE
RGB Images feature extractor	DINO
Feature-to-Feature network (main PCs)	MTFI_FtoF_PCs
Feature-to-Input network (main PCs)	MTFI_FtoI_PCs
Input-to-Feature network (main PCs)	MTFI_ItoF_PCs
Feature-to-Feature network (main RGB)	MTFI_FtoF_RGB
Feature-to-Input network (main RGB)	MTFI_FtoI_RGB
Input-to-Feature network (main RGB)	MTFI_ItoF_RGB

📋 Please put all checkpoints in folder checkpoints.

Training

To train the models in the paper, run these commands:

MTFI pipeline with Feature-to-Feature distillation network:

To save the features for distillation network training:

python main.py \
--method_name DINO+Point_MAE \
--experiment_note <your_note> \
--save_feature_for_fusion \
--save_path datasets/patch_lib \

The results are saved in the results folder. If you need to output the raw anomaly scores at image or pixel level to a file, add --save_raw_results or --save_seg_results. You can use utils/heatmap to generate similar visualized results. You can define the maximum number of threads with --cpu_core_num and leave your note through --experiment_note.

To train MTFI pipeline with Feature-to-Feature distillation network:

python hallucination_network_pretrain.py \
--lr 0.0005 \
--batch_size 32 \
--data_path datasets/patch_lib \
--output_dir <your_output_dir_path> \
--train_method HallucinationCrossModality \
--num_workers 2 \

📋 For MTFI pipeline with Feature-to-Feature distillation network, PCs or RGB images as the main modality are trained simultaneously. If you think your GPU memory is really not enough, maybe try with --accum_iter 2 for Gradient Accumulation and change --batch_size 16 correspondingly. The data is loaded into GPU memory in advance to speed up the training, you can change it through dataset and dataloader.

MTFI pipeline with Feature-to-Input distillation network:

To save the features for distillation network training:

python main.py \
--method_name DINO+Point_MAE \
--experiment_note <your_note> \
--save_frgb_xyz \
--save_path_frgb_xyz datasets/frgb_xyz \
--save_rgb_fxyz \
--save_path_rgb_fxyz datasets/rgb_fxyz \

For PCs as main modality.

python hallucination_network_pretrain.py \
--lr 0.0005 \
--batch_size 32 \
--data_path datasets/rgb_fxyz \
--output_dir <your_output_dir_path> \
--train_method XYZFeatureToRGBInputConv \

For RGB images as main modality.

python hallucination_network_pretrain.py \
--lr 0.0005 \
--batch_size 32 \
--data_path datasets/frgb_xyz \
--output_dir <your_output_dir_path> \
--train_method RGBFeatureToXYZInputConv \

MTFI pipeline with Input-to-Feature distillation network:

Similarly, you need to store the features for distillation network training:

python main.py \
--method_name DINO+Point_MAE \
--experiment_note <your_note> \
--save_frgb_xyz \
--save_path_frgb_xyz datasets/frgb_xyz \
--save_rgb_fxyz \
--save_path_rgb_fxyz datasets/rgb_fxyz \

For PCs as main modality.

python -u hallucination_network_pretrain.py \
--lr 0.0003 \
--batch_size 32 \
--data_path datasets/frgb_xyz \
--output_dir <your_output_dir_path> \
--train_method XYZInputToRGBFeatureHRNET \
--c_hrnet 128 \
--pin_mem \

For RGB images as main modality.

python -u hallucination_network_pretrain.py \
--lr 0.0002 \
--batch_size 32 \
--data_path datasets/rgb_fxyz \
--output_dir <your_output_dir_path> \
--train_method XYZInputToRGBFeatureHRNET \
--c_hrnet 192 \
--pin_mem \

Evaluation

Evaluate the model on MVTec 3D-AD with single and dual memory bank method

For single PCs memory bank:

python main.py \
--method_name Point_MAE \
--experiment_note <your_note> \

📋 For single RGB memory bank and dual memory bank, please replace Point_MAE with DINO and DINO+Point_MAE, respectively.

MTFI pipeline with Feature-to-Feature distillation network:

For PCs as main modality.

python main.py \
--method_name WithHallucination \
--use_hn \
--main_modality xyz \
--fusion_module_path checkpoints/MTFI_FtoF_PCs.pth \
--experiment_note <your_note> \

📋 For RGB images as main modality, please replace xyz with rgb for --main_modality and give the new checkpoint path checkpoints/MTFI_FtoF_RGB.pth to the model.

MTFI pipeline with Feature-to-Input distillation network:

For PCs as main modality.

python main.py \
--method_name WithHallucinationFromFeature \
--use_hn_from_rgb_conv \
--main_modality xyz \
--fusion_module_path checkpoints/MTFI_FtoI_PCs.pth \
--experiment_note <your_note> \

📋 For RGB images as main modality, replace xyz with rgb and give model the new checkpoint path.

MTFI pipeline with Input-to-Feature distillation network:

For PCs as main modality.

python main.py \
--method_name WithHallucination \
--use_hrnet \
--main_modality xyz \
--c_hrnet 128 \
--fusion_module_path checkpoints/MTFI_ItoF_PCs.pth \
--experiment_note <your_note> \

For RGB images as main modality.

python main.py \
--method_name WithHallucination \
--use_hrnet \
--main_modality rgb \
--c_hrnet 192 \
--fusion_module_path checkpoints/MTFI_ItoF_RGB.pth \
--experiment_note <your_note> \

Citation

If you think this repository is helpful for your project, please use the following.

@misc{sui2024crossmodal,
      title={Cross-Modal Distillation in Industrial Anomaly Detection: Exploring Efficient Multi-Modal IAD}, 
      author={Wenbo Sui and Daniel Lichau and Josselin Lefèvre and Harold Phelippeau},
      year={2024},
      eprint={2405.13571},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

We appreciate the following github repos for their valuable code:

evenrose/CMDIAD

Incomplete Multimodal Industrial Anomaly Detection via Cross-Modal Distillation

Visualization of Some Prediction Results

Requirements

Dataset and Pre-trained Models

Dataset

Data Pre-processing

Checkpoints

Training

MTFI pipeline with Feature-to-Feature distillation network:

MTFI pipeline with Feature-to-Input distillation network:

MTFI pipeline with Input-to-Feature distillation network:

Evaluation

Evaluate the model on MVTec 3D-AD with single and dual memory bank method

MTFI pipeline with Feature-to-Feature distillation network:

MTFI pipeline with Feature-to-Input distillation network:

MTFI pipeline with Input-to-Feature distillation network:

Citation

Acknowledgement