dong03/MVSS-Net

some questions about mvss-net, sincerely

yuanhanghuang opened this issue · 5 comments

Hello, highly appreciate your recent work achieved SOTA, but there is still something I want to figure out, the questions are listed as follows:

(1)In the SOTA comparison, I noticed you have used the CASIA V2 as the training datasets and other public datasets as the testing datasets, but what is the composition of the validation set?

(2)In the paper, the data online augmentation method was too short and brief, but the clear thing is that you simultaneously embrace the tampered images and non-tampered images. I wonder how to implement that "naive manipulations either by cropping and pasting a squared area", is it just cropping and pasting on the same image from the non-tampered datasets?
8580979dfc522cbfde4f64f4fd9eaa2

(3)Recently, another questions bother me greatly——since the previous work didn't take the non-tampered datasets into account, like GSR-Net, with this extra non-tampered datastes, mvss-net achieve greater progress(some metric even more that 10 percent points). However, is it just a fair comparison with the amount more than doubled?

Hope you fine, looking forward to your detailed answers!

Thanks for your interest and questions.
(1), we take the same validation set as used in DEFACTO;

(2), the flipping, blurring and compression are applied on both manipulated and authentic images. Naive manipulation is applied on the authentic, mainly contains copy-moving or inpainting a random square or circle. It involves no splicing as it is the easiest manner among three types of manipulation. Such auto-manipulation may not be comparable with human-made ones, yet contains obvious pattern in noise and edge views, which will benefit our multi-view feature learning.

(3), introducing authentic images in the training process is necessary under the real application, however our experiments show that it is not a natural thing. As shown in the top two lines of the ablation study table, simply introducing the authentic even leads to the clear drop of the pixel-level performance of the manipulated images. That is because they force the model to strike a balance between the sensitivity and the specificity. From the ablation experiment and related analysis in the paper, the improvement of MVSS-Net is not from the amount of data, but the semantic-agnostic feature extraction via multi-view feature learning.

Hope these will help, feel free to further contact.

Thanks for your patience and guidance!

Sorry to trouble you again, I still have a problem involving the optimal threshold. The following image is recaptured from the CSDN platform, it can be seen that most of people encounter the same problem as me. We just take the optimal F1 calculation method from the source code of the Constrained rcnn, but can't get the close metric value just as reported in the MVSS-Net paper.
图片

In detail, I calculate the precison value and recall value of each image under the different thresholds, and then get the optimal F1 for one image from the maximum value of list[(2 * precision * recall) / (precision + recall + eps)]. After that, the average F1 of the whole datasets is recorded as the final indicator.

But the result is far from the reported(0.62 from ours and 0.753 from yours on CASIA V1), I don't konw which part is wrong, or just you have taken another calculation method to achieve higher metric value. Here be glad for more details.

Good luck!

That's quite interesting. According to your description, there seem to be some discrepancies between our evaluation of optimal f1s for all the methods as there is no standard and unified code. I'm figuring it out with my colleagues. However, as these artworks and MVSS-Net are with code / pre-trained models, if you do need numbers urgently, I suggest applying a unified evaluation criterion yourself. That seems to be fair to measure their relative performances. And lastly, I still suggest a fixed threshold of 0.5, as it's unified and more practical.

But you still haven't explained how the optimal threshold for MVSS is determined, why doesn't F1 reach 0.753? Your open-source program also doesn't seem to reach COVER's 0.824. It's strange that F1 can be doubled or even tripled just by adjusting the threshold.