How to train and test MemSeg in other datasets

Question

How to train and test MemSeg in other datasets

jamdodot opened this issue 10 months ago · 11 comments

Hello, sorry to bother you. How can i set up training and testing on other datasets. such as KolektorSDDs. Do I first have to adjust the structure of the dataset and then make some changes in the code? 🙏 @TooTouch

Answer 1 · 2023-12-27T07:26:57.000Z

For example, the new data set is 228 x 630 pixels. Should it be adjusted to 256x256?

Answer 2 · 2023-12-27T12:39:15.000Z

I mainly changed the anomaly_mask.json and the datadir and batch_size in the configs.yaml file, and modified the new dataset to follow the MVTec dataset arch. Now the code can run and train, but SEED: 42 has not changed. Does this mean that the parameters are still the same as before

Answer 3 · 2023-12-27T18:42:16.000Z

Hi, @jamdodot

For example, the new data set is 228 x 630 pixels. Should it be adjusted to 256x256?

Converting to 256 x 256 is the easiest way to go. However, if the proportions of your existing image are important to you, you may want to leave it unconverted.

I mainly changed the anomaly_mask.json and the datadir and batch_size in the configs.yaml file, and modified the new dataset to follow the MVTec dataset arch. Now the code can run and train, but SEED: 42 has not changed. Does this mean that the parameters are still the same as before

Yes, the seed is for reproducibility to produce the same result.

Answer 4 · 2023-12-28T02:42:22.000Z

Thank you for your answer. I still have some questions I would like to ask you.

seed parameters can be unchanged ?
Metrics will be calculated every 100 iterations. I added two lines of printing information in evaluate function.

  image_masks = np.array(image_masks)
    anomaly_map = np.array(anomaly_map)
    print(image_targets)
    print(anomaly_score)
    auroc_image = roc_auc_score(image_targets, anomaly_score)

What makes me confused is that the value of the image_targets array does not change every time

3. My training graph looks really weird and I don't know what's wrong. can you give me some advice 😭 🙏
wandb-Link
output.log

Answer 5 · 2023-12-28T11:48:26.000Z

A1. The seed parameter can be changed with any number.
A2. The value of the image_targets array does not change every time, because the shuffle is False for testloader.
A3. Can you explain what looks weird?

I modified the focal loss few minutes ago.
You can try again with a modified focal loss. (#22, c3c6e99)

Answer 6 · 2023-12-28T12:30:21.000Z

thanks ，I can use jupyter to view the reasoning of each picture in the test collection(ALL in test collection).Some segmentation effects are indeed not very good. Below are images of metrics，They don't rise gradually, they always go up and down.

I want to know if mine is running normally but with low indicators, or if there is something wrong with the initial configuration.

Answer 7 · 2023-12-28T12:37:38.000Z

I think that the fluctuation can be reduced with lower learning rate.
Since the evaluation scores have pretty much converged since the beginning, I think that with a small learning rate, a small parameter update should be enough to improve performance.

Answer 8 · 2024-01-11T03:45:28.000Z

I would like to ask about the N normal samples in the memory bank. What is the value of N? How to determine its value? 🤔 @TooTouch

Answer 9 · 2024-01-11T10:17:34.000Z

@jamdodot

There is no criterion for the N normal samples of the memory bank.
The value of N in this repo is the value mentioned in MemSeg paper.

N should be specified as a sample size large enough to capture all the features of normal data.
This will vary depending on the characteristics of your data, but you can use your domain knowledge to determine this.

Answer 10 · 2024-02-26T12:21:10.000Z

Why does the test path need to contain good samples in the MVTech AD data set structure? If there is no good sample file, the following error will occur：

only one class present in y_true. roc auc score is not defined in that case

Is it to calculate AUROC? @TooTouch

Answer 11 · 2024-04-07T08:42:07.000Z

@jamdodot

I'm sorry too late for reply.

The good samples need to calculate AUROC. Because AUROC is the area under ROC curve that is constructed by true positive rate and false positive rate.

To calculate the true positive rate and false positive rate need good class in binary class setting.

https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5