About Stanford dogs accuracy
EdwinKuo1337 opened this issue · 2 comments
Hi, could you release your training settings for the Stanford dogs dataset? I set the lr to 3e-3 and did not change other settings, however the model is underfitting. I only get 1.7% accuracy after 200k steps.
I used 4 tesla v100(32G) gpus, batch_size=14(16 is OOM) to reproduce Dog and kept the same config with your paper, but the acc is only 90.5, a big difference from the paper 92.3.
I used 4 tesla v100(32G) gpus, batch_size=14(16 is OOM) to reproduce Dog and kept the same config with your paper, but the acc is only 90.5, a big difference from the paper 92.3.
Hi Oliver, just wondering if you can share your pretrained model with me? Thanks in advance!
Thanks for your reply, i use the pretrained model "VIT_B16" downloaded from your link. By the way, i removed the "part_select" and "part_layer"(like pure vit), the performance is similar with TransFG which i reproduced 90.5.
`
class Encoder(nn.Module):
def init(self, config):
super(Encoder, self).init()
self.layer = nn.ModuleList()
for _ in range(config.transformer["num_layers"] - 1):
layer = Block(config)
self.layer.append(copy.deepcopy(layer))
# self.part_select = Part_Attention()
# self.part_layer = Block(config)
self.part_norm = LayerNorm(config.hidden_size, eps=1e-6)
def forward(self, hidden_states):
# attn_weights = []
for layer in self.layer:
hidden_states, _ = layer(hidden_states)
# attn_weights.append(weights)
# part_num, part_inx = self.part_select(attn_weights)
# part_inx = part_inx + 1
# parts = []
# B, num = part_inx.shape
# for i in range(B):
# parts.append(hidden_states[i, part_inx[i,:]])
# parts = torch.stack(parts).squeeze(1)
# concat = torch.cat((hidden_states[:,0].unsqueeze(1), parts), dim=1)
# part_states, part_weights = self.part_layer(concat)
# part_encoded = self.part_norm(part_states)
part_encoded = self.part_norm(hidden_states)
return part_encoded
`