lgvaz/faststyle

Investigate why u-net performs poorly with style transfer

Opened this issue · 4 comments

lgvaz commented

Theoretically it should be way better than TransformerNet.

It performs really well for superres (which is almost the same thing). It's a more appropriate architecture image to image problems overall.

lgvaz commented

This paper does a great explanation of why U-net might fail in some cases. Quoting from the paper:

The U-net is ”lazy”. That is to say if the U-net find itself
able to handle a problem in low-level layers, the high-level
layers will not bother to learn anything. If we train a U-net
to do a very simple work ”copying image” as in fig. 4, where
the inputs and outputs are same, the loss value will drop to
0 immediately. Because the first layer of encoder discovers
that it can simply transmit all features directly to the last
layer of the decoder by skiping connection to minimize the
loss. In this case, no matter how many times we train the
U-net, the mid-level layers will not get any gradient.

lgvaz commented

A strategy for solving the issue can be:

  • Freeze skip connections and train network
  • After some time unfreeze skip connections and see what happens?
lgvaz commented

The paper talks about "Guide decoders", although it's not deeply explained what they mean.

I think what I can try doing, is generating the image without the skip connections at each middle layer (basically repeating the next layers but without skip connections). This would generate a image for each middle layer, thus the gradient is always present.

lgvaz commented

First try to modify DynamicUnet failed miserably. Need to find a away to get the output of each UnetBlock with and without skip connections, and then use that to create multiple outputs from DynamicUnet