Add only works for two layers of the same size
AndreJFBico opened this issue ยท 23 comments
Hi, so im working on setting up a fast style transfer network in code instead of importing it from a .pb file, so i can start improving it.
However im having some issues trying to setup a residual layer, heres the network.
styleNet.start
->> Convolution(convSize: ConvSize(outputChannels: 32, kernelSize: 9, stride: 1), neuronType: .none, id: "Variable_0")
->> InstanceNorm(shiftModifier: "1", scaleModifier: "2", id: "Variable_")
->> Neuron( type: .relu)
->> Convolution(convSize: ConvSize(outputChannels: 64, kernelSize: 3, stride: 2), neuronType: .none, id: "Variable_3")
->> InstanceNorm(shiftModifier: "4", scaleModifier: "5", id: "Variable_")
->> Neuron( type: .relu)
->> Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 2), neuronType: .none, id: "Variable_6")
->> InstanceNorm(shiftModifier: "7", scaleModifier: "8", id: "Variable_")
->> Neuron( type: .relu)
->> ResidualLayer(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), layers:
Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, id: "Variable_9")
->> InstanceNorm(shiftModifier: "10", scaleModifier: "11", id: "Variable_")
->> Neuron( type: .relu)
->> Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, id: "Variable_12")
->> InstanceNorm(shiftModifier: "13", scaleModifier: "14", id: "Variable_"))
When initializing the following error appears: assertion failed: Add works for two layers of the same size: file /Users/Andre/Downloads/Bender-fix-issue-38/Sources/Layers/Add.swift, line 23
I also print the layers in the network.
"PRINTING LAYERS"
": Bender.Start"
": Bender.Convolution"
": Bender.InstanceNorm"
": Bender.Neuron"
": Bender.Convolution"
": Bender.InstanceNorm"
": Bender.Neuron"
": Bender.Convolution"
": Bender.InstanceNorm"
": Bender.Neuron"
": Bender.Dummy"
": Bender.Identity"
": Bender.Add"
assertion failed: Add works for two layers of the same size: file /Users/Andre/Downloads/Bender-fix-issue-38/Sources/Layers/Add.swift, line 23
If you wondering what kind of network im trying to emulate its this one:
https://github.com/lengstrom/fast-style-transfer/blob/master/src/transform.py
def net(image):
conv1 = _conv_layer(image, 32, 9, 1)
conv2 = _conv_layer(conv1, 64, 3, 2)
conv3 = _conv_layer(conv2, 128, 3, 2)
resid1 = _residual_block(conv3, 3)
resid2 = _residual_block(resid1, 3)
resid3 = _residual_block(resid2, 3)
resid4 = _residual_block(resid3, 3)
resid5 = _residual_block(resid4, 3)
conv_t1 = _conv_tranpose_layer(resid5, 64, 3, 2)
conv_t2 = _conv_tranpose_layer(conv_t1, 32, 3, 2)
conv_t3 = _conv_layer(conv_t2, 3, 9, 1, relu=False)
preds = tf.nn.tanh(conv_t3) * 150 + 255./2
return preds
Note: i have changed instance norm input to allow specific shift/scale modifiers, its a temporary way to just setup the weight ids.
The layers do sanity checks of the sizes. In this case is the Add, which comes from ResidualLayer.
So it seems that what comes before the residual layer and what it yields have different size. I'm trying to figure out why, as it looks good.
A few tips while I tackle it ๐
You can create your own blocks, as in the file you cite, by extending CompositeLayer
and ResidualLayer
.
Convolution should not use bias here, as the example you provide uses TF convolutions, which have no bias.
I'm noticing there's a useless convSize
in ResidualLayer
. Gonna remove it.
The outputsize variable is passed down from the start layer when initialize is called for every incoming[0].outputsize
However for the Add itself the incoming[1] is nil
so Add is not having 2 inputs?
The Add has 2 inputs, the output size of the second input is null, while the output size of the first is correct.
@AndreJFBico Do you have a repo to take a look?
@dernster Yeah i can provide it, ill update the main post when it uploads.
In the mean time this is the network im running right now.
//We have in total 48 layers and 48 weight variables
styleNet.start
->> Convolution(convSize: ConvSize(outputChannels: 32, kernelSize: 9, stride: 1), neuronType: .none, useBias: false, id: "Variable_0")
->> InstanceNorm(shiftModifier: "1", scaleModifier: "2", id: "Variable_")
->> Neuron( type: .relu)
->> Convolution(convSize: ConvSize(outputChannels: 64, kernelSize: 3, stride: 2), neuronType: .none, useBias: false, id: "Variable_3")
->> InstanceNorm(shiftModifier: "4", scaleModifier: "5", id: "Variable_")
->> Neuron( type: .relu)
->> Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 2), neuronType: .none, useBias: false, id: "Variable_6")
->> InstanceNorm(shiftModifier: "7", scaleModifier: "8", id: "Variable_")
->> Neuron( type: .relu)
->> [Identity(), (
Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_9")
->> InstanceNorm(shiftModifier: "10", scaleModifier: "11", id: "Variable_")
->> Neuron( type: .relu)
->> Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_12")
->> InstanceNorm(shiftModifier: "13", scaleModifier: "14", id: "Variable_")
->> Identity())]
->> Add()
->> [Identity(),(
Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_15")
->> InstanceNorm(shiftModifier: "16", scaleModifier: "17", id: "Variable_")
->> Neuron( type: .relu)
->> Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_18")
->> InstanceNorm(shiftModifier: "19", scaleModifier: "20", id: "Variable_")
->> Identity())]
->> Add()
->> [Identity(),(
Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_21")
->> InstanceNorm( shiftModifier: "22", scaleModifier: "23", id: "Variable_")
->> Neuron( type: .relu)
->> Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_24")
->> InstanceNorm(shiftModifier: "25", scaleModifier: "26", id: "Variable_")
->> Identity())]
->> Add()
->> [Identity(),(
Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_27")
->> InstanceNorm(shiftModifier: "28", scaleModifier: "29", id: "Variable_")
->> Neuron( type: .relu)
->> Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_30")
->> InstanceNorm(shiftModifier: "31", scaleModifier: "32", id: "Variable_")
->> Identity())]
->> Add()
->> [Identity(),(
Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_33")
->> InstanceNorm(shiftModifier: "34", scaleModifier: "35", id: "Variable_")
->> Neuron( type: .relu)
->> Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_36")
->> InstanceNorm(shiftModifier: "37", scaleModifier: "38", id: "Variable_")
->> Identity())]
->> Add()
->> ConvTranspose(size: ConvSize(outputChannels: 64, kernelSize: 3, stride: 2), id: "Variable_39")
->> InstanceNorm(shiftModifier: "40", scaleModifier: "41", id: "Variable_")
->> Neuron( type: .relu)
->> ConvTranspose(size: ConvSize(outputChannels: 32, kernelSize: 3, stride: 2), id: "Variable_42")
->> InstanceNorm(shiftModifier: "43", scaleModifier: "44", id: "Variable_")
->> Neuron( type: .relu)
->> Convolution(convSize: ConvSize(outputChannels: 3, kernelSize: 9, stride: 1), neuronType: .none, useBias: false, id: "Variable_45")
->> InstanceNorm( shiftModifier: "46", scaleModifier: "47", id: "Variable_")
->> Neuron(type: .tanh)
->> ImageLinearTransform()
It runs even though the end result still has some strange artifacts and it crashes on input resolution higher than 256(but thats other issues), the way i circumvented the add issue was to add a identity layer next to the instance norm layer, that way the outputSize is passed correctly to the Add layer.
If the output is not ok, maybe what you can yield different layers outputs and compare them to the python implementation, to see if they match, in order to be able to find where the error is.
But have you tried saving the protobuf from the python code with benderthon and loading it with bender?
Good suggestion, in terms of trying to import a protobuf, yes i have done that and it works properly, it still crashes with input size of 1024 for some reason but for lower than 512 input size image it works ok.
The reason im trying to define the network in code is so that i can understand it better and also work with it.
@AndreJFBico Hi! We couldn't reproduce the original issue, could you please provide a repo with it? Additionally, if you can provide an example of a larger input size causing a crash would be helpful.
@AndreJFBico did you figure this out?
@dernster I recreated this by generating a model from fast-style-transfer, used benderthon to get the pb, and running the code taken from the example in my own project with the new pb. The only pb, g_and_w2, that works is the one given from the bender style example.
I also get the same error when I make the swap in the example project.
Yeah, now we were able to reproduce the error, but we still don't know what is the cause. And we don't have an ETA for this currently.
If you want, you can go ahead and try to find it. We'll check it out when we have time (I hope it's in the following weeks).
Im sorry that i never got that repo ready, its just i started changing the core of bender itself and i no longer had things as they were done originally.
Well i sort of figured it out, however i couldn't make it work with the .pb file straight off the bat.
And since i wanted to experiment with the code itself i converted the network of lengstrom fast style transfer to benders.
styleNet.start
->> Convolution(convSize: ConvSize(outputChannels: 32, kernelSize: 9, stride: 1), neuronType: .none, useBias: false, id: "Variable_0")
->> InstanceNorm(shiftModifier: "1", scaleModifier: "2", id: "Variable_")
->> Neuron( type: .elu)
->> Convolution(convSize: ConvSize(outputChannels: 64, kernelSize: 3, stride: 2), neuronType: .none, useBias: false, id: "Variable_3")
->> InstanceNorm(shiftModifier: "4", scaleModifier: "5", id: "Variable_")
->> Neuron( type: .elu)
->> Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 2), neuronType: .none, useBias: false, id: "Variable_6")
->> InstanceNorm(shiftModifier: "7", scaleModifier: "8", id: "Variable_")
->> Neuron( type: .elu)
->> [Identity(), (
Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_9")
->> InstanceNorm(shiftModifier: "10", scaleModifier: "11", id: "Variable_")
->> Neuron( type: .elu)
->> Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_12")
->> InstanceNorm(shiftModifier: "13", scaleModifier: "14", id: "Variable_")
->> Identity())]
->> Add()
->> [Identity(),(
Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_15")
->> InstanceNorm(shiftModifier: "16", scaleModifier: "17", id: "Variable_")
->> Neuron( type: .elu)
->> Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_18")
->> InstanceNorm(shiftModifier: "19", scaleModifier: "20", id: "Variable_")
->> Identity())]
->> Add()
->> [Identity(),(
Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_21")
->> InstanceNorm( shiftModifier: "22", scaleModifier: "23", id: "Variable_")
->> Neuron( type: .elu)
->> Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_24")
->> InstanceNorm(shiftModifier: "25", scaleModifier: "26", id: "Variable_")
->> Identity())]
->> Add()
->> [Identity(),(
Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_27")
->> InstanceNorm(shiftModifier: "28", scaleModifier: "29", id: "Variable_")
->> Neuron( type: .elu)
->> Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_30")
->> InstanceNorm(shiftModifier: "31", scaleModifier: "32", id: "Variable_")
->> Identity())]
->> Add()
->> [Identity(),(
Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_33")
->> InstanceNorm(shiftModifier: "34", scaleModifier: "35", id: "Variable_")
->> Neuron( type: .elu)
->> Convolution(convSize: ConvSize(outputChannels: 128, kernelSize: 3, stride: 1), neuronType: .none, useBias: false, id: "Variable_36")
->> InstanceNorm(shiftModifier: "37", scaleModifier: "38", id: "Variable_")
->> Identity())]
->> Add()
->> ConvTranspose(size: ConvSize(outputChannels: 64, kernelSize: 3, stride: 2), id: "Variable_39")
->> InstanceNorm(shiftModifier: "40", scaleModifier: "41", id: "Variable_")
->> Neuron( type: .relu)
->> ConvTranspose(size: ConvSize(outputChannels: 32, kernelSize: 3, stride: 2), id: "Variable_42")
->> InstanceNorm(shiftModifier: "43", scaleModifier: "44", id: "Variable_")
->> Neuron( type: .relu)
->> Convolution(convSize: ConvSize(outputChannels: 3, kernelSize: 9, stride: 1), neuronType: .none, useBias: false, id: "Variable_45")
->> InstanceNorm( shiftModifier: "46", scaleModifier: "47", id: "Variable_")
->> Neuron(type: .tanh)
->> ImageLinearTransform()
Its a one on one conversion from the following file https://github.com/lengstrom/fast-style-transfer/blob/master/src/transform.py
I also used benderthon to export the layer weights individually, i also had to alter its conversion script as only some weights require transposing.
I also changed a bit how the weights are searched but the principle should be the same.
About the add works for two layers of the same size issue, i figured it out that i had to add a identity layer next the instance normalization layer before an add layer, that way the issue disappeared.
I tried looking into the code of how the layers were setup but i found nothing problematic, its just somehow the second incoming node of the add layer looses its convsize atribute, i never really found out why.
If they manage to fix this somehow i bet your conversion should be simple and painless.
Cheers.
Does Bender by chance have notes on how they generated their pb files?
The mnist sample came from the benderthon sample. The style transfer one probably came directly from the repo we were talking about, I don't know why it isn't working now. But we will take a look at this error as soon as we can.
@mdramos I updated benderthon so the sample uses a simpler way to generate the protobuf. Maybe take a look at it.
ok thanks! yea I actually got benderthon working just fine yesterday
I am using Using lengstrom project. While using protobuf file i faced
"Fatal error: Index out of range"
var kernelWidth: Int {
return Int(dim[1].size)
}
What is the Issue ?
Attached my Pb model
@mdramos what changes you have made it to working ?
The issue with your graph seems to be the ExpandDims
at the beginning. I am looking into it.
There are two issues with your graph. The first one is fixed with #111 and has to do with the ExpandDims.
The second one is that Mul
and Add
with scalar are not supported yet. From what I saw they are only used at the end to scale the final result. What you can do there is to cut the graph after the Tanh
(when freezing) node and then add a postprocessing layer to do the scaling like this:
Neuron(type: ActivationNeuronType.custom(neuron: MPSCNNNeuronLinear(device: Device.shared, a: 2.0, b: -1)), id: "scale_neuron")
where a
and b
are the scale and offset.
If there are any other questions please open a new issue.
Working ....
preds= tf.add(tf.nn.tanh(conv_t3)*150, 255./2,name="preds")
how you calculated a and b as 2 and -1. there is a variation in outimage in python and ios ?
What value we need to put for a and b ?
2 and -1 are an example.
MPSCNNNeuronLinear
documentation says it calculates ax + b
. So yours would be something like 150 and 255/2