TACJu/TransFG

About Stanford dogs accuracy

EdwinKuo1337 opened this issue · 2 comments

Hi, could you release your training settings for the Stanford dogs dataset? I set the lr to 3e-3 and did not change other settings, however the model is underfitting. I only get 1.7% accuracy after 200k steps.

I used 4 tesla v100(32G) gpus, batch_size=14(16 is OOM) to reproduce Dog and kept the same config with your paper, but the acc is only 90.5, a big difference from the paper 92.3.

I used 4 tesla v100(32G) gpus, batch_size=14(16 is OOM) to reproduce Dog and kept the same config with your paper, but the acc is only 90.5, a big difference from the paper 92.3.

Hi Oliver, just wondering if you can share your pretrained model with me? Thanks in advance!

Thanks for your reply, i use the pretrained model "VIT_B16" downloaded from your link. By the way, i removed the "part_select" and "part_layer"(like pure vit), the performance is similar with TransFG which i reproduced 90.5.

`
class Encoder(nn.Module):
def init(self, config):
super(Encoder, self).init()
self.layer = nn.ModuleList()
for _ in range(config.transformer["num_layers"] - 1):
layer = Block(config)
self.layer.append(copy.deepcopy(layer))
# self.part_select = Part_Attention()
# self.part_layer = Block(config)
self.part_norm = LayerNorm(config.hidden_size, eps=1e-6)

def forward(self, hidden_states):
    # attn_weights = []
    for layer in self.layer:
        hidden_states, _ = layer(hidden_states)
        # attn_weights.append(weights)            
    # part_num, part_inx = self.part_select(attn_weights)
    # part_inx = part_inx + 1
    # parts = []
    # B, num = part_inx.shape
    # for i in range(B):
    #     parts.append(hidden_states[i, part_inx[i,:]])
    # parts = torch.stack(parts).squeeze(1)
    # concat = torch.cat((hidden_states[:,0].unsqueeze(1), parts), dim=1)
    # part_states, part_weights = self.part_layer(concat)
    # part_encoded = self.part_norm(part_states)  
    part_encoded = self.part_norm(hidden_states) 

    return part_encoded

`