About the vertical direction coordinate encoding in coord_handler.py
Closed this issue · 3 comments
Line 99 in 86e4715
- Why do you use self.ts_spatial_size instead of self.ss_spatial_size in the denominator? Is it a mistake or a special design?
- If it is a special design, could you please explain it? Thanks.
It is to reduce the test-time numerical domain gap. The size difference between texture syn input and the structure syn input can be seen as kind of a pre-padding. And we want such region to be at the saturated region (0.995 to 1 after tanh) of tanh, and so that when we extrapolate vertical coordinates at testing, the coordinates wont be too far away from it's training distribution (~1).
This figure may explain a bit of the concept, the red line is the result of the dividing by ss_spatial_size, and the green line is using the ts_spatial_size. Each color has three lines representing three different sizes of the features. The horizontal axis is the raw coordinate value, and the vertical axis is the value after tanh projection.
OK, I got it. Thanks for the explanation.