Grouped Conv2D optimization when groups equal to input channels

When the input channels and groups are identical, which is a common case of networks, the GroupedConv2d is basically Depthwise Conv2D of depth 1.

Given the fact that stacked Conv2D for GroupedConv2d is not efficient (

TIM-VX/src/tim/vx/internal/src/ops/vsi_nn_op_grouped_conv2d.c

Line 148 in 3e8d5e3

for (i = 0; i < nn_param->group; i++)

) when the number of groups is large (32, 64, 128 etc ), maybe it is a good idea optimize Grouped Conv2D into Depthwise Conv2D when the input channels are equal to the groups?

Hi FengWang,

In such case input_channel equal to group number. You should map it to depthwise conv directly.