a bug report

hi! Ganchao here is a bug report.
i find out an error which might lead to inaccurate model reproduction or training results when i debug and reproduce the project.
the att_size=1024 set in run command console does not work. the reason is as follows:
despite the initial att_size parameter inside in initialize function of the SoftAttention & GumbelAttention class is opt.att_size(=1024), all of the att_size parameters within the reference instances actually are opt.hidden_size (=512 for msvd dataset / =1300 for msr-vtt dataset).
related code lines:

RMN/models/RMN.py

Line 18 in 14a9eff

def __init__(self, feat_size, hidden_size, att_size):

RMN/models/RMN.py

Line 45 in 14a9eff

def __init__(self, feat_size, hidden_size, att_size):

RMN/models/RMN.py

Line 171 in 14a9eff

    
           self.spatial_attn = SoftAttention(opt.region_projected_size, opt.hidden_size, opt.hidden_size)

RMN/models/RMN.py

Line 175 in 14a9eff

self.temp_attn = SoftAttention(feat_size, opt.hidden_size, opt.hidden_size)

RMN/models/RMN.py

Line 207 in 14a9eff

    
           self.spatial_attn = SoftAttention(region_feat_size, opt.hidden_size, opt.hidden_size)

RMN/models/RMN.py

Line 211 in 14a9eff

    
           self.relation_attn = SoftAttention(2*feat_size, opt.hidden_size, opt.hidden_size)

RMN/models/RMN.py

Line 245 in 14a9eff

    
           self.cell_attn = SoftAttention(opt.hidden_size, opt.hidden_size, opt.hidden_size)

RMN/models/RMN.py

Line 285 in 14a9eff

    
           self.module_attn = SoftAttention(opt.hidden_size, opt.hidden_size, opt.hidden_size)

RMN/models/RMN.py

Line 287 in 14a9eff

    
           self.module_attn = GumbelAttention(opt.hidden_size, opt.hidden_size, opt.hidden_size)

if this parameter att_size do work as our expected, the opt.hidden_size in the above code lines should be replaced by opt.att_size. is it right ?
thanks!

Thanks for your careful findings. In initialize function of the SoftAttention & GumbelAttention class, we did not use opt.att_size, we use att_size which will be assigned to opt.hidden_size. In fact, there is no opt.att_size in our initial implementation, we just use opt.hidden_size as att_size. We think this had little impact on the results, if you find anything different, you can report your findings here.

thank u very much.
i have modified the att_size from 512 to 1024 for msvd dataset and the result is as shown below:
B@4 | METEOR | ROUGE_L | CIDEr
52.8 | 35.9 | 72.1 | 90.7
the above result suggests that the modification causes model performance deterioratation.
it is noticeable that the result is not stable as u mentioned in ReadMe file and i do not reproduce it again due to time limit.
i also reproduced the original work 3 times as the att_size equals hidden_size i.e. 512 for msvd and the results are as follows:
B@4 | METEOR | ROUGE_L | CIDEr (1st)
52.0 | 35.8 | 72.0 | 89.1
B@4 | METEOR | ROUGE_L | CIDEr (2nd)
53.4 | 36.4 | 73.5 | 94.8
B@4 | METEOR | ROUGE_L | CIDEr (3rd)
55.7 | 37.2 | 73.7 | 101.0
the unstable results may be incurred by the initialization measures, is that true?
the reults stability indeed is another noteworthy & interesting issue.
some people perfer show the best results and others like the mean results.
do u have some valuable suggestions on fairly result comparision.
i also find the best cider/meteor epoch and corresponding model weights saved in training stage sometimes is not the best ones. i mean, the evaluate results might be better when loading the model weights file of other epochs in evaluate stage.
may be it is because the best cider/meteor epoch refers to the best/meteor epoch for validation data during training rather than the best ones for testing data, is it right?
well, i just want to confirm if the att_size parameter works before and share my findings as well as some inner-confused questions with u this time.
i really appreciate ur patient reply, thanks again!
looking forward to hearing from u ~

If you want to get stable results, you can fix the seed for all random methods by using:

def setup_seed(seed):
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.enabled = True

But maybe you should try several seeds to get a good result.
The best cider/meteor model for validation split maybe not the best for test split.

thanks for your valuable insights and detail introductions on model training phase.
the project and its elegant code is so impressive for me.
thanks again!