Difference
Your code in ShuffleNetV1/blocks.py
x = self.branch_main_1(x)
if self.group > 1:
x = self.channel_shuffle(x)
x = self.branch_main_2(x)
The channel_shuffle operation is in the next of conv3x3 operation. But in the paper, the channel_shuffle operation occurs before the con3x3 operation.