onnx/onnx-coreml

Error while converting op of type: BatchNormalization

ekam123 opened this issue · 3 comments

❓Question

I am using the fastai library and Resnet34 as the pretraining model. The model converts to Onnx fine but when trying to convert to mlmodel, I get the following error:
Screenshot_2020-01-04 predict_actors_from_scratch - Jupyter Notebook(1). This error only comes when the final layers include a concat function for the pooling layers:
Screenshot_2020-01-04 predict_actors_from_scratch - Jupyter Notebook(5)
This is the function definition.

"Layer that concats AdaptiveAvgPool2d and AdaptiveMaxPool2d."
"Output will be 2*sz or 2 if sz is None"
class AdaptiveConcatPool2d(Module): def __init__(self, sz:Optional[int]=None): self.output_size = sz or 1 self.ap = nn.AdaptiveAvgPool2d(self.output_size) self.mp = nn.AdaptiveMaxPool2d(self.output_size) def forward(self, x): return torch.cat([self.mp(x), self.ap(x)], 1)

The conversion fully works if the remove the concat and make the final layer like this:
Screenshot_2020-01-04 predict_actors_from_scratch - Jupyter Notebook(3).

What exactly does the error mean and why is the problem showing up with BatchNormalization? Does this mean I have to implement a custom layer for Onnx because AdaptiveConcat is not supported?

System Information

fastai/pytorch
coremltools version : 3.1
onnx-coreml version: 1.1.0
OS: Mac
How you install python: conda
python version: 3.6.9

@ekam123 could you find a solution? how does removing the concat affect performance?

I had the same problem. My workaround was to modify the code of _convert_bn to add a BatchNorm layer even if the rank is detected to be 0, using the last axis for expansion:

        add_bn_with_expansion(builder, node, err, node.name, node.inputs[0], node.outputs[0], channels[0],
                              scale, bias, mean, var, epsilon, axes_for_expansion=[-1])

I don't know why the rank is 0 in this case or why this situation is not already supported in the BatchNorm conversion code, but the previous change worked for me without having to remove the concat op.

Edit: fixed the name of the function in my link to the source code (the link was correct).

Thanks @pcuenca, I'll definitely try it out.
@rsomani95 I replaced the concat with just the nn.AdaptiveAvgPool2d and basically got the same performance.

final_layer = nn.Sequential(nn.AdaptiveAvgPool2d(output_size=1),
                            nn.Flatten(), 
                            nn.BatchNorm1d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
                            nn.Dropout(p=0.5, inplace=False),
                            nn.Linear(in_features=2048, out_features=512, bias=True), 
                            nn.ReLU(inplace=True), 
                            nn.BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
                            nn.Dropout(p=0.5, inplace=False), 
                            nn.Linear(in_features=512, out_features=46, bias=True))
learn = cnn_learner(data, models.resnet50, metrics=accuracy, custom_head=final_layer) ```