JierunChen/FasterNet

The difference between 'slicing' mode and 'split cat' mode?

cswry opened this issue · 1 comments

cswry commented

Hello, thank you for your simple but powerful work.
There is a question about the Partial_conv3.
I test the inference speed with 'slicing' mode and 'split cat' mode and find there is no difference between them.

Hi, thanks for your interest in this work.

The "slicing" mode w/o x = x.clone() in its implementation can be faster than the "split_cat" mode, particularly for GPUs and large input resolution, as "slicing" consumes less memory access.

However, when PConv is put right after the start of a residual shortcut, the inplace "slicing" implementation would modify the shortcut. Therefore, x = x.clone() is introduced in PConv's "slicing" mode implementation, which causes latency overhead and makes no big difference between the "slicing" mode and "split_cat" mode.

To conclude, please use "split_cat" mode when PConv is put right after the start of a residual shortcut. Otherwise, you may use the "slicing" mode w/o x = x.clone() for faster inference.