luissen/ESRT

What is common.Scale(1) means?

Opened this issue · 1 comments

class Scale(nn.Module):
    def __init__(self, init_value=1e-3):
        super().__init__()
        self.scale = nn.Parameter(torch.FloatTensor([init_value]))

    def forward(self, input):
        return input * self.scale

When the self.scale=1, does this option does nothing?
Why do we need this layer?

Is the self.scale learnable parameters 𝜆𝑥 in the paper?