Why not add a MLP?

Question

Why not add a MLP?

Chenguang-Wang opened this issue 2 years ago · 8 comments

Why not add a MLP after computing the various attentions?

Answer 1 · 2022-04-24T14:31:39.000Z

I have another question.
Because my environment is different from yours.
I want to know how many cards you used and how much time it took.
Thanks!

Answer 2 · 2022-04-25T13:50:02.000Z

Hi @Chenguang-Wang and thank you for your question.

The experiment is conducted with a single GTX 1080Ti (11G) and the training takes about 3-5 hours on ACDC but much longer on Synapse (maybe 8-12 hours or more im not so sure rn).

Answer 3 · 2022-04-25T14:52:37.000Z

@Dootmaan Thank you for your answer.
Why not add a MLP after computing the various attentions?

Answer 4 · 2022-04-29T05:13:36.000Z

If you are interested in using MLP after the original cancatenation, maybe you could try to do it in MLP-Mixer way. By performing token mixing followed by a channel mixing, the local attention map can be properly mixed up with less computational cost. However, in our paper, since MTM is already a global-wise operation (just like MLP), we thought it may be not necessary to adiditionally using an MLP layer.

Answer 5 · 2022-05-06T01:55:54.000Z

OK, thank you for your answering.

Answer 6 · 2022-05-12T03:55:00.000Z

请问下，
ACDC的划分和Transunet和SwinUnet一致吗？

Answer 7 · 2022-05-21T06:40:53.000Z

请问下， ACDC的划分和Transunet和SwinUnet一致吗？

Yes we used the same split for TransUnet and Swin-Unet, and that's why we have to rerun all the experiments on ACDC. As far as we know, Swin-Unet itself uses a different split on ACDC since the authors of TransUnet didnt provide the preprocessed ACDC dataset.

Answer 8 · 2022-06-07T07:20:45.000Z

This issue is closed since no further activity has happened for a while.