example for 3 channel input?

Question

example for 3 channel input?

Closed this issue 5 years ago · 11 comments

lessw2020 commented 5 years ago

Hi there,
Very impressed with your work and trying to use it for classification with 3 channel images. I made what I believe are the appropriate changes but err out with:

'''
~/capsnet/varcaps/src/vb_routing.py in update_qlatent(self, a_i, V_ji)
236 if verbose:
237 print(f"shape of sum_p_j {sum_p_j} and p_j {p_j.shape}")
--> 238 return 1. / torch.clamp(sum_p_j, min=1e-11) * p_j
239
240 def reduce_icaps(self, x):

RuntimeError: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 8
'''

Is there any example code showing 3 channel images as the current sample code is all greyscale (or any other tips for handling 3 channel?)
Thanks!!

Answer 1 · 2020-03-26T00:56:55.000Z

here's a bit of debug info if it helps:
'''
p_j = torch.Size([8, 16, 16, 1, 1, 8, 8, 3, 3])
shape of sum_p_j torch.Size([8, 16, 1, 1, 1, 8, 8, 3, 3])
shape of sum_p_j torch.Size([8, 16, 1, 1, 1, 8, 8, 3, 3]) and p_j torch.Size([8, 16, 16, 1, 1, 8, 8, 3, 3])
routing result = torch.Size([8, 16, 16, 1, 1, 8, 8, 3, 3])
lnp_j = torch.Size([8, 16, 2, 1, 1, 1, 1, 8, 8])
p_j = torch.Size([8, 16, 2, 1, 1, 1, 1, 8, 8])
shape of sum_p_j torch.Size([32, 16, 1, 1, 1, 1, 1, 4, 4])
shape of sum_p_j torch.Size([32, 16, 1, 1, 1, 1, 1, 4,4]) and p_j torch.Size([8, 16, 2, 1, 1, 1, 1, 8, 8])
err out at return
'''

Answer 2 · 2020-03-26T02:23:25.000Z

Hi, sorry to hear you're having trouble.
The code requires no modifications to work with multi channel images, nonetheless I've added the SVHN dataset as a working example of 3 channel input. Hope it helps!

Answer 3 · 2020-03-26T03:20:19.000Z

fyi, to get this working, I switched to greyscale for now and that works as expected.

Answer 4 · 2020-03-26T03:49:20.000Z

Thanks a ton @fabio-deep! It's quite late here so I'll run that in the morning first thing.
For reference, on a small pilot set of medical data I was able to get your capsnet (greyscale) to 100% train/val accuracy vs EfficientNet B0 could not get close to that (surprising but there are spatial and shadow aspects to the interpretation).
So really appreciate you posting the SVHN code esp so quickly and will go through it in the morning and update. Thanks again.

Answer 5 · 2020-03-26T13:25:41.000Z

That is interesting, good work.
Hope you can resolve your issue and get it working using the example as reference, if you need help let me know.

Answer 6 · 2020-03-26T15:10:23.000Z

By the way make sure to adjust the kernel_size argument of the class routing layer to match your final feature map size, which depends on your input height and width.

In the case of [C, 32, 32] sized input images, it comes to FINAL_FMAP_SIZE = 4, when using the 3 layer capsnet example provided.

   ClassRouting = VariationalBayesRouting2d(in_caps=self.D, out_caps=self.n_classes,
    --> kernel_size=FINAL_FMAP_SIZE, stride=1, pose_dim=self.P,
        cov='diag', iter=args.routing_iter,
        alpha0=1., m0=torch.zeros(self.D), kappa0=1.,
        Psi0=torch.eye(self.D), nu0=self.D+1, class_caps=True)

Answer 7 · 2020-03-26T17:58:48.000Z

Thanks @fabio-deep - updating things now to test the 3 channel and get that working.
Thanks also for kernel tip! (that may have been the issue actually as I had 48x48 input with the 3 channels...)
And getting a bit ahead, would you have any tips on arch scale up to be suitable for more typical input images of say 320x320?

Answer 8 · 2020-03-26T18:55:10.000Z

The easiest way is to simply use a deeper feature extractor stem network that down samples the input to a reasonable size, before feeding it to the capsnet, and train everything end-to-end. In the examples provided only a single conv layer is used.

You can also just reduce the number of routing iterations, or the capsule pose dimensions to speed it up with minimal performance degradation.

Alternatives to these methods are basically open research questions.

Answer 9 · 2020-03-26T19:12:49.000Z

awesome, thanks again @fabio-deep - the deeper feature extractor stem makes sense here.

Note if you happen to still be online - I keep getting a 7D tensor going into BatchNorm3d (instead of 5) and not sure why (while trying to get the 3 channel working):

vbrouting self.m_j = torch.Size([64, 16, 16, 4, 4])
#working above

#failing:
vbrouting self.m_j = torch.Size([8, 2, 16, 5, 5, 1, 1])

~/capsnet/varcaps/src/vb_routing.py in forward(self, a_i, V_ji)
133 if verbose:
134 print(f"vbrouting self.m_j = {self.m_j.shape}")
--> 135 self.m_j = self.BN_v(self.m_j) # use 'else' above to deactivate BN_v for class_caps
136
137 # Out ← [?, C, P, P, F, F] ← [?, C, P*P, F, F]

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py in forward(self, input)
83
84 def forward(self, input):
---> 85 self._check_input_dim(input)
86
87 # exponential_average_factor is set to self.momentum

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py in _check_input_dim(self, input)
322 if input.dim() != 5:
323 raise ValueError('expected 5D input (got {}D input)'
--> 324 .format(input.dim()))
325
326

ValueError: expected 5D input (got 7D input)

I have bs=8, num_classes=2 for reference. I changed the kernel=6 as inputs are 48x48 (vs 4 for 32) per your instructive comment. If you have any tips much appreciated but I'll continue tracing here.
Thanks again!

Answer 10 · 2020-03-26T19:32:47.000Z

For inputs of 48x48, assuming you're using the same 3 layer capsnet as in the example, the final feature map size should be ((((48 - 5) // 2 + 1) - 3) // 2 + 1) - 3 + 1 = 8. Try that and let me know.

Answer 11 · 2020-03-26T19:49:57.000Z

that was it - now it's working - thanks a bunch @fabio-deep!
(I'll likely have other questions if you don't mind as I build on this for a production model but really appreciate all the help and your innovations with this architecture). (closing this as well).