Discrepancies vs Table A1 in paper

Question

Discrepancies vs Table A1 in paper

alexander-soare opened this issue 3 years ago · 3 comments

alexander-soare commented 3 years ago

I noticed some possible discrepancies of the architecture parameters here vs those in table A1 of the paper

For ImageNet models, is it correct that:

The table should say h=[3,3,4]?
The order of the scale_hidden_dims in the table is inverted. That is, hierarchies 1, 2 and 3 should say [4d, 4h] × 2, 1 [2d, 2h] × 2, 4 [d, h] × k, 16?

Answer 1 · 2021-06-25T01:34:59.000Z

Hi @alexander-soare I think you are right. Thanks for spotting the details! We will correct in the next version.

Answer 2 · 2021-06-25T08:06:40.000Z

@zizhaozhang thanks for confirming! I'm working on a PyTorch implementation

Answer 3 · 2021-06-28T13:05:50.000Z

@zizhaozhang FYI I just finished converting the weights and I can also confirm that the $a$ (number of transformer layers) are reversed too. So it should be [4d, 4h] × k, 1 [2d, 2h] × 2, 4 [d, h] × 2, 16