MLRichter/receptive_field_analysis_toolbox

Sizes of tensors must match except in dimension 1. Expected size 26 but got size 25 for tensor number 1 in the list.

Closed this issue · 3 comments

When loading a model from torch.hub I am getting the following error: Sizes of tensors must match except in dimension 1. Expected size 26 but got size 25 for tensor number 1 in the list

Minimal working example:

import torch
from rfa_toolbox import create_graph_from_pytorch_model, visualize_architecture

model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
graph = create_graph_from_pytorch_model(model)

Please see for more info: issues/6455

This is caused by some control flow within the forward-pass of YoloV5. RFA-Toolbox uses the JIT-Compiler to extract the graph of a model, which only evaluated the parts of the control flow touched by the forward-pass during the trace.
For some reason, this requires any YoloV5-model to be in train-mode in order to be traceable by the JIT-Compiler of PyTorch.

If you put the model in train mode, it works.
Here is some example code, that fixed the issue:


import torch
from rfa_toolbox import create_graph_from_pytorch_model, visualize_architecture

# Model
model_name = "YoloV5s"
model = torch.hub.load('ultralytics/yolov5', f'{model_name.lower()}')  # or yolov5m, yolov5l, yolov5x, custom
#model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.onnx')  # or yolov5m, yolov5l, yolov5x, custom
model.train()
graph = create_graph_from_pytorch_model(model.cpu(), (4, 3, 640, 640))
visualize_architecture(graph, model_name=f"{model_name}", input_res=10000).render(f"{model_name}")

@MLRichter this does not solve the problem for me, it results in the same error again, please have a look at the log:

Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2022-1-31 torch 1.10.1+cu113 CUDA:0 (NVIDIA GeForce RTX 3090, 24268MiB)

Fusing layers... 
Model Summary: 213 layers, 7225885 parameters, 0 gradients
Adding AutoShape... 
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_3645/3417301980.py in <module>
      7 #model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.onnx')  # or yolov5m, yolov5l, yolov5x, custom
      8 model.train()
----> 9 graph = create_graph_from_pytorch_model(model.cpu(), (4, 3, 640, 640))
     10 visualize_architecture(graph, model_name=f"{model_name}", input_res=10000).render(f"{model_name}")

/opt/conda/lib/python3.8/site-packages/rfa_toolbox/encodings/pytorch/ingest_architecture.py in create_graph_from_model(model, filter_rf, input_res, custom_layers)
    411         else KNOWN_FILTER_MAPPING[filter_rf]
    412     )
--> 413     tm = torch.jit.trace(model, (torch.randn(*input_res),))
    414     return make_graph(
    415         tm, filter_rf=filter_func, ref_mod=model, classes_to_not_visit=custom_layers

/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py in trace(func, example_inputs, optimize, check_trace, check_inputs, check_tolerance, strict, _force_outplace, _module_class, _compilation_unit)
    739 
    740     if isinstance(func, torch.nn.Module):
--> 741         return trace_module(
    742             func,
    743             {"forward": example_inputs},

/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py in trace_module(mod, inputs, optimize, check_trace, check_inputs, check_tolerance, strict, _force_outplace, _module_class, _compilation_unit)
    956             example_inputs = make_tuple(example_inputs)
    957 
--> 958             module._c._create_method_from_trace(
    959                 method_name,
    960                 func,

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102             return forward_call(*input, **kwargs)
   1103         # Do not call functions when jit is used
   1104         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _slow_forward(self, *input, **kwargs)
   1088                 recording_scopes = False
   1089         try:
-> 1090             result = self.forward(*input, **kwargs)
   1091         finally:
   1092             if recording_scopes:

/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     26         def decorate_context(*args, **kwargs):
     27             with self.__class__():
---> 28                 return func(*args, **kwargs)
     29         return cast(F, decorate_context)
     30 

~/.cache/torch/hub/ultralytics_yolov5_master/models/common.py in forward(self, imgs, size, augment, profile)
    508         if isinstance(imgs, torch.Tensor):  # torch
    509             with amp.autocast(enabled=autocast):
--> 510                 return self.model(imgs.to(p.device).type_as(p), augment, profile)  # inference
    511 
    512         # Pre-process

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102             return forward_call(*input, **kwargs)
   1103         # Do not call functions when jit is used
   1104         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _slow_forward(self, *input, **kwargs)
   1088                 recording_scopes = False
   1089         try:
-> 1090             result = self.forward(*input, **kwargs)
   1091         finally:
   1092             if recording_scopes:

~/.cache/torch/hub/ultralytics_yolov5_master/models/common.py in forward(self, im, augment, visualize, val)
    397         b, ch, h, w = im.shape  # batch, channel, height, width
    398         if self.pt or self.jit:  # PyTorch
--> 399             y = self.model(im) if self.jit else self.model(im, augment=augment, visualize=visualize)
    400             return y if val else y[0]
    401         elif self.dnn:  # ONNX OpenCV DNN

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102             return forward_call(*input, **kwargs)
   1103         # Do not call functions when jit is used
   1104         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _slow_forward(self, *input, **kwargs)
   1088                 recording_scopes = False
   1089         try:
-> 1090             result = self.forward(*input, **kwargs)
   1091         finally:
   1092             if recording_scopes:

~/.cache/torch/hub/ultralytics_yolov5_master/models/yolo.py in forward(self, x, augment, profile, visualize)
    124         if augment:
    125             return self._forward_augment(x)  # augmented inference, None
--> 126         return self._forward_once(x, profile, visualize)  # single-scale inference, train
    127 
    128     def _forward_augment(self, x):

~/.cache/torch/hub/ultralytics_yolov5_master/models/yolo.py in _forward_once(self, x, profile, visualize)
    147             if profile:
    148                 self._profile_one_layer(m, x, dt)
--> 149             x = m(x)  # run
    150             y.append(x if m.i in self.save else None)  # save output
    151             if visualize:

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102             return forward_call(*input, **kwargs)
   1103         # Do not call functions when jit is used
   1104         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _slow_forward(self, *input, **kwargs)
   1088                 recording_scopes = False
   1089         try:
-> 1090             result = self.forward(*input, **kwargs)
   1091         finally:
   1092             if recording_scopes:

~/.cache/torch/hub/ultralytics_yolov5_master/models/common.py in forward(self, x)
    273 
    274     def forward(self, x):
--> 275         return torch.cat(x, self.d)
    276 
    277 

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 26 but got size 25 for tensor number 1 in the list.

I am using the latest Yolov5 docker, (I see that the docker nowadays also uses torch 1.10.1+cu113) but still getting an error:

I get an error with PyTorch version, not sure if this is maybe a compatibility issue:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behavior is the source of the following dependency conflicts.
torchtext 0.11.0a0 requires torch==1.10.0a0+0aef44c, but you have torch 1.10.1+cu113 which is incompatible.

I was able to reproduce the error and fix the bug in your code.

import torch
from rfa_toolbox import create_graph_from_pytorch_model, visualize_architecture


def main():
    model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
    model.train()
    model.cpu()
    graph = create_graph_from_pytorch_model(model, input_res=(4, 3, 640, 640))
    visualize_architecture(graph, "model").view()

if __name__ == "__main__":
    main()

I will elaborate on the changes to the code:
First, the model was not deployed on the CPU, while the input tensor used for tracing was, which would cause a raise.
I should probably add some logic that detects the device automatically, though, since this is a trap that probably many will fall in to.
Second, YoloV5 has a bit exotic input size requirements of 640x640 pixel (compared to classifiers, which use a lot smaller resolution by default and a way less picky in their resolution). The shape of the tensor used for tracing (obtaining the graph shape) is by default 399x399 pixels, which is enough for most classifiers, but too small for YoloV5.
The tuple that was provided in my code provides the shape for the larger tracing tensor. However, since the posting of this code version 1.4.2 has released, which slightly changed the interface bumping this specific argument to the third place, this is the reason, the code of this post provides it as a keyword-argument, which the older version didn't.
Third, the model needs to be in train mode, since otherwise control-flow is enabled during the forward pass which is directly dependent on the input of the model. Behavior like this is not supported by the JIT-compiler of PyTorch, which we rely on to extract the compute-graph, and will therefore raise and error in the process.