About implementation of MLP
root116688 opened this issue · 4 comments
root116688 commented
Thanks for sharing code
I have some questions.
- why use Conv1d to implement MLP?
- Why Didn't use Linear to implement MLP?
- When will it be greater than 1, and what does this parameter mean?
Thanks!
ma-xu commented
Hi Thanks for your interest.
For Q1 and Q2, they are mathmatically equal.
For Q3, we didn't use groups and the parameter should always be 1. We implement it for additional experiments.
root116688 commented
Thanks a lot, Linear and Conv1d(kernel size =1)
Seems like the features value is equal,
but seems the weight and bias initial or backward is different? Or totally same?
ma-xu commented
They may be different but make almost no difference to final results.
root116688 commented
Thaks for your reply!