A style transfer demo
Noticed that the original image data is compressed at each layer of a CNN. By doing this, will lose some details but gain efficiency (e.g. pooling operation). Imaging that if we represent the original image with the feature maps output from one of the convolutional layer, we will get a blurry image without details.
original | representation |
---|---|
As a result, there are "room" to balance content and style, and any details didn't be filled in can be filled in with style.
There are three inputs: the content image, the style image, and the target image(image going to be optimized). The optimize goal is a weighted combination of the content loss and the style loss.
Gram Matrix
Gram Matrix
-
"Auto" (self) - correlation between a thing and itself
-
"Correlation" - how related one thing is to another
-
"Autocorrelation" - how related X is to itself
Be noticed that The Gram Matrix X.shape
is G.shape
is
Content loss
- Pass the content image and the target image through a same pre-trained CNN like VGG
- Calculate the Mean Squared Error (MSE) between these two outputs
Style loss
-
Pass the style image through the same CNN
-
Grab the output at five different locations and calculate the Gram Matrix
-
Do the same thing for the target image
-
Calculate the MSE between the Gram Matrix of the style image and target image
-
Calculate the weighted sum of these MSEs
Total loss
Add these the content loss and the style loss together to get the total loss
style_image | content_image | output |
---|---|---|
loss
iter=0, loss=3710.9208984375
iter=1, loss=1113.28857421875
iter=2, loss=732.7880249023438
iter=3, loss=569.886474609375
iter=4, loss=478.1509704589844
iter=5, loss=423.4969787597656
iter=6, loss=385.49444580078125
iter=7, loss=358.8546447753906
iter=8, loss=337.9999084472656
iter=9, loss=321.8232116699219
duration: 0:00:50.642452