JierunChen/FasterNet

question about formula 7

abcsimple opened this issue · 12 comments

Hi, your work is amazing!

I have a question about formula 7 in the paper. Since the feature maps with dim cp and feature maps with dim (c-cp) are concatenated before the PWConv, the channel of the input of PWConv should be c. Meanwhile, PWConv has filters with shape 1 * 1 * c, so I think the FLOPs of a PConv and a PWConv should be calculated as: h * w * (k^2 * cp^2 + c^2). Can you help me with that?

@abcsimple Hi, you are right. Thanks for pointing that out. We'll update the paper accordingly.

Thanks for your reply. Actually, I think the T-shaped Conv's FLOPs in formula 6 should be reconsidered as well.

The T-shaped Conv can be considered as a group conv, so both the input dim and output dim about the k * k filters are c_p and for the 1 * 1 filters, both the input dim and output dim are c-c_p. Therefore the FLOPs of a T-shaped Conv should be calculated as : h * w * (k^2 * c_p^2 + (c - c_p)^2)

@abcsimple Hi, for T-shaped Conv, the output dimension is c, either for $k \times k$ filters or the $1 \times 1$ filters in your way of decomposition. Therefore, formula 6 should be correct.

@JierunChen Can you show me the code of T-shaped convs? It may help me to understand the defination. Here is my version:

class T_shaped_conv3(nn.Module):
    def __init__(self, dim, n_div):
        super().__init__()
        self.dim_head = dim // n_div
        self.dim_tail = dim - self.dim_head
        self.t_shaped_conv3_head = nn.Conv2d(self.dim_head, self.dim_head, 3, 1, 1, bias=False)
        self.t_shaped_conv3_tail = nn.Conv2d(self.dim_tail, self.dim_tail, 3, 1, 1, bias=False)

    def forward_split_cat(self, x: Tensor) -> Tensor:
        x1, x2 = torch.split(x, [self.dim_head, self.dim_tail], dim=1)
        x1 = self.t_shaped_conv3_head(x1)
        x2 = self.t_shaped_conv3_tail(x2)
        x = torch.cat((x1, x2), 1)

Good question, waiting for reply

@JierunChen Hi, I have a question about question 6. According to your response above , are you mean that k * k filters with input dimension c_p and output dimension c and 1 * 1 fiters with input dimension c-c_p and output dimension c?

@abcsimple Hi, thanks for the implementation, yet it differs from the T-shaped Conv mentioned in the paper. In your version, the input information is isolated between the "head" and "tail" channels. However, for the output of each location in T-shaped Conv, it attends to all channels simultaneously.

Indeed we did not implement the T-shaped Conv, as it cannot be implemented solely by regular Conv but requires dedicated efforts.

@xglllll Yes, exactly for the T-shaped Conv.

@JierunChen Thank you for your reply, i feel understood.

Thanks for your response! Let's get back to the original question:

If the FLOPs of a PConv and a PWConv (formula 7) should be calculated as: h * w * (k^2 * cp^2 + c^2), since (k^2 * cp * c) is higher than (k^2 * cp^2) but (c * (c - cp)) is lower than (c^2), how can we prove that the FLOPs of a T-shaped Conv (formula 6) is higher than the FLOPs of a PConv and a PWConv (formula 7)? Thank you.

@abcsimple The FLOPs of a T-shaped Conv (formula 6) is higher than that of a PConv and a PWConv (formula 7) if $(k^{2} - 1)c > k^{2} c_p$ , which holds in most cases, e.g., when $c = 4c_p$ and $k = 3$.

Great! Thank you for your explanation.