vlievin/biva-pytorch

Network configuration for 64x64 natural image dataset

Opened this issue · 4 comments

Hi,
Nice work! I am curious about how BIVA performs on the CUB bird dataset. I want to make it work on 64x64 resolution, thus wanting to reference the configuration of CelebA but found none. Particularly, I am not very sure about the network configuration. I have followed the instruction in the original paper and written one, presented below. Could you please help me to see whether it is correct? Many thanks!

def get_deep_vae_cub():
    filters = 64
    no_layers = 2
    enc = []
    z = []

    enc_z1 = [[filters, 7, 1]] * no_layers
    enc_z1 += [[filters, 7, 2]]
    z_1 = {'N': 20, 'kernel': 32, 'block': ConvNormal}
    enc += [enc_z1]
    z += [z_1]

    enc_z2 = [[filters, 5, 1]] * no_layers
    enc_z2 += [[filters, 5, 2]]
    z_2 = {'N': 19, 'kernel': 16, 'block': ConvNormal}
    enc += [enc_z2]
    z += [z_2]

    for i in range(3, 11):
        # layer 3~10
        enc_zi = [[filters, 3, 1]] * no_layers
        enc_zi += [[filters, 3, 1]]
        z_i = {'N': 21-i, 'kernel': 16, 'block': ConvNormal}
        enc += [enc_zi]
        z += [z_i]

    enc_z11 = [[filters, 3, 1]] * no_layers
    enc_z11 += [[filters, 3, 2]]
    z_11 = {'N': 10, 'kernel': 8, 'block': ConvNormal}
    enc += [enc_z11]
    z += [z_11]

    for i in range(12, 20):
        # layer 12 ~ 19
        enc_zi = [[2 * filters, 3, 1]] * no_layers
        enc_zi += [[2 * filters, 3, 1]]
        z_i = {'N': 21 - i, 'kernel': 8, 'block': ConvNormal}
        enc += [enc_zi]
        z += [z_i]

    enc_z20 = [[2 * filters, 3, 1]] * no_layers
    enc_z20 += [[2 * filters, 3, 2]]
    z_20 = {'N': 1, 'kernel': 4, 'block': ConvNormal}
    enc += [enc_z20]
    z += [z_20]

    return enc, z

Hi, the CelebA experiments were done using the Tensorflow version of the code. So, unfortunately, I am less confident with all the implementation details. Your architecture (pasted bellow in plain text) looks correct. The CUB dataset is much smaller, so you may need to decrease the size of the model (reducing the number of filters and the number of deterministic layers) and potentially increase the dropout rate. Ultimately you can downsample the images and use a shallower architecture.

for l = L ... 1
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 1, 'kernel': 4, 'block': 'ConvNormal'}
blocks [[64, 3, 1], [64, 3, 1], [64, 3, 2]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 2, 'kernel': 8, 'block': 'ConvNormal'}
blocks [[128, 3, 1], [128, 3, 1], [128, 3, 1]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 3, 'kernel': 8, 'block': 'ConvNormal'}
blocks [[128, 3, 1], [128, 3, 1], [128, 3, 1]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 4, 'kernel': 8, 'block': 'ConvNormal'}
blocks [[128, 3, 1], [128, 3, 1], [128, 3, 1]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 5, 'kernel': 8, 'block': 'ConvNormal'}
blocks [[128, 3, 1], [128, 3, 1], [128, 3, 1]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 6, 'kernel': 8, 'block': 'ConvNormal'}
blocks [[128, 3, 1], [128, 3, 1], [128, 3, 1]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 7, 'kernel': 8, 'block': 'ConvNormal'}
blocks [[128, 3, 1], [128, 3, 1], [128, 3, 1]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 8, 'kernel': 8, 'block': 'ConvNormal'}
blocks [[128, 3, 1], [128, 3, 1], [128, 3, 1]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 9, 'kernel': 8, 'block': 'ConvNormal'}
blocks [[128, 3, 1], [128, 3, 1], [128, 3, 1]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 10, 'kernel': 8, 'block': 'ConvNormal'}
blocks [[64, 3, 1], [64, 3, 1], [64, 3, 2]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 11, 'kernel': 16, 'block': 'ConvNormal'}
blocks [[64, 3, 1], [64, 3, 1], [64, 3, 1]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 12, 'kernel': 16, 'block': 'ConvNormal'}
blocks [[64, 3, 1], [64, 3, 1], [64, 3, 1]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 13, 'kernel': 16, 'block': 'ConvNormal'}
blocks [[64, 3, 1], [64, 3, 1], [64, 3, 1]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 14, 'kernel': 16, 'block': 'ConvNormal'}
blocks [[64, 3, 1], [64, 3, 1], [64, 3, 1]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 15, 'kernel': 16, 'block': 'ConvNormal'}
blocks [[64, 3, 1], [64, 3, 1], [64, 3, 1]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 16, 'kernel': 16, 'block': 'ConvNormal'}
blocks [[64, 3, 1], [64, 3, 1], [64, 3, 1]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 17, 'kernel': 16, 'block': 'ConvNormal'}
blocks [[64, 3, 1], [64, 3, 1], [64, 3, 1]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 18, 'kernel': 16, 'block': 'ConvNormal'}
blocks [[64, 3, 1], [64, 3, 1], [64, 3, 1]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 19, 'kernel': 16, 'block': 'ConvNormal'}
blocks [[64, 5, 1], [64, 5, 1], [64, 5, 2]]
----------------------------------------------------------------------------------------------------
stochastic-layer {'N': 20, 'kernel': 32, 'block': 'ConvNormal'}
blocks [[64, 7, 1], [64, 7, 1], [64, 7, 2]]

@vlievin Thanks for your suggestions. I just ran this configuration, and encountered the following error. Do you have any idea how to solve it. Sorry, the network architecture is intriguing to me now. I really have difficulty in finding out the reason.

Traceback (most recent call last):
  File "/home/s2006466/workspace/biva-pytorch/run_deepvae.py", line 145, in <module>
    model(x)
  File "/home/s2006466/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/s2006466/workspace/biva-pytorch/biva/model/deepvae.py", line 177, in forward
    data = self.generate(posteriors, N=x.size(0), **kwargs)
  File "/home/s2006466/workspace/biva-pytorch/biva/model/deepvae.py", line 137, in generate
    x, data = stage(x, posterior, **kwargs)
  File "/home/s2006466/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/s2006466/workspace/biva-pytorch/biva/model/stage.py", line 582, in forward
    _, td_p_data = self.td_stochastic(d, inference=False, sample=False, **kwargs)
  File "/home/s2006466/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/s2006466/workspace/biva-pytorch/biva/model/stochastic.py", line 217, in forward
    mu, logvar = self.compute_logits(x, inference)
  File "/home/s2006466/workspace/biva-pytorch/biva/model/stochastic.py", line 199, in compute_logits
    logits = self.px2z(x)
  File "/home/s2006466/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/s2006466/workspace/biva-pytorch/biva/layers/convolution.py", line 133, in forward
    self.init_parameters(x)
  File "/home/s2006466/workspace/biva-pytorch/biva/layers/convolution.py", line 156, in init_parameters
    x = self.conv(x)
  File "/home/s2006466/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/s2006466/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 345, in forward
    return self.conv2d_forward(input, self.weight)
  File "/home/s2006466/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size 20 64 8 8, expected input[48, 128, 9, 9] to have 64 channels, but got 128 channels instead

@vlievin Hi, I temporally evade this problem by setting all the convolutional channels as 64, although original paper suggests 128 channels for the last 8 layers.
I think perhaps there are some problems dealing with different number of channels for two consecutive layers. But I can't figure out the bugs right now.

Yes, indeed the issue comes from the way the Tensor shapes are inferred. The problem doesn't occur in Tensorflow since the shape of the tensors is inferred automatically. I will leave this issue open, it shouldn't be too complicated to solve it. Unfortunately, I don't have the time to solve it right now.

TL;DR: The current implementation only handles convolutions using the same number of features.