questions about MLP - drop out.

Question

questions about MLP - drop out.

Closed this issue 3 months ago · 1 comments

In your paper, you mentioned that the solution to identical mapping is reconstruction during restoration.

class Mlp(nn.Module):
def init(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):
super().init()
out_features = out_features or in_features
hidden_features = hidden_features or in_features
self.fc1 = nn.Linear(in_features, hidden_features)
self.act = act_layer()
self.fc2 = nn.Linear(hidden_features, out_features)
self.drop = nn.Dropout(drop)

def forward(self, x):
    x = self.fc1(x)
    x = self.act(x)
    x = self.drop(x)
    x = self.fc2(x)
    x = self.drop(x)
    return x

(bottleneck): ModuleList(
(0): bMlp(
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(act): GELU(approximate='none')
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(drop): Dropout(p=0.2, inplace=False)
)
)

This structure seems to be applied in the vanilla transformer as well, but I’m curious to know what the key differences are between this and the vanilla transformer.
Did I reference the wrong part?

Answer 1 · 2024-10-21T09:19:10.000Z

The key difference is whether to activate the Dropout (p=0.2 by default). As discussed in the paper: " In Dinomaly, Dropout is used to discard neural activations in an MLP bottleneck randomly. Instead of alleviating overfitting, the role of Dropout in Dinomaly can be explained as feature noise and pseudo feature anomaly".