google-research/nested-transformer

Discrepancies vs Table A1 in paper

alexander-soare opened this issue · 3 comments

I noticed some possible discrepancies of the architecture parameters here vs those in table A1 of the paper

image

For ImageNet models, is it correct that:

  1. The table should say h=[3,3,4]?
  2. The order of the scale_hidden_dims in the table is inverted. That is, hierarchies 1, 2 and 3 should say [4d, 4h] × 2, 1 [2d, 2h] × 2, 4 [d, h] × k, 16?

Hi @alexander-soare I think you are right. Thanks for spotting the details! We will correct in the next version.

@zizhaozhang thanks for confirming! I'm working on a PyTorch implementation

@zizhaozhang FYI I just finished converting the weights and I can also confirm that the $a$ (number of transformer layers) are reversed too. So it should be [4d, 4h] × k, 1 [2d, 2h] × 2, 4 [d, h] × 2, 16