After patch recovery, block artifacts and lack of smoothness appear in the predicted image.
740402059 opened this issue · 4 comments
Hello, author. I applied your model's method to my dataset and found that the predicted image after patch recovery at the end exhibits blockiness and lack of smoothness. Apart from not including Earth position, the overall framework and backbone of the model are consistent with the pseudocode you provided. I would like to inquire about how to address this issue. Thank you.
Help, has anyone encountered this issue before?
I added a 3×3 convolution to the input and output, but it still happens.
Hi, I expect that the first row shows the expected (ground-truth) maps and the second row shows the predicted maps, is that right?
We did not run into this issue before. I really doubt if anyone did. It is not related to the Earth-specific bias. I think the most probable reason lies in the way you performed up-sampling. Maybe you can check the up-sampling function and weights to see why the model generated blocked outputs (constant values in each block).
Yes, your understanding is correct. The first line is the actual measurement, and the second line is the prediction. This is an observation process of radar for precipitation cloud clusters.
I think your model's approach is similar to swin-unet, but tailored for different tasks.
https://github.com/HuCaoFighting/Swin-Unet
Since you provided only pseudocode, I based my modifications on your approach, building upon the foundation of swin-unet. The functions for patch_merge, and patch_expand were all re-implemented according to your pseudocode. Similar to swin-unet, I encountered issues with non-smooth output and block artifacts.
HuCaoFighting/Swin-Unet#70
Regarding the process of patch embedding from the original image into 4x4 blocks and then patch recovery to the original image, I am concerned that using non-overlapping patches might lead to smaller correlations between each patch, resulting in blocky artifacts. Additionally, I share the opinion that a pure Transformer may struggle to capture local details in images, as mentioned in TransUnet.
https://github.com/Beckschen/TransUNet
I also plan to follow your advice and check the weights during the upsampling process. Thank you very much for your guidance!