rwightman/efficientdet-pytorch

PyTorch 1.12

mayeroa opened this issue · 4 comments

Hi,

First of all, thanks for this amazing repo.
I have a question and probably an error using the effdet package with PyTorch 1.12.
Using the following code :

import torch
from effdet.factory import create_model
from typing import Any, Dict, List, Tuple, Union


class EfficientDet(torch.nn.Module):
    """Implementation of the EfficientDet model.

    Args:
        model_name: Name of the detection model to use. Defaults to 'efficientdet_d0'.
        num_classes: Number of classes to detect. Defaults to 90.
        pretrained: Use pretrained weights. Defaults to False.
        soft_nms: Use soft-nms, this is incredibly slow. Defaults to False.
        max_det_per_image: Max detections per image limit, output of NMS. Defaults to 100.
        confidence_threshold: Filter out detections below this confidence threshold. Defaults to 0.5.

    """

    def __init__(
        self,
        model_name: str = 'efficientdet_d0',
        num_classes: int = 90,
        pretrained: bool = False,
        pretrained_backbone: bool = True,
        soft_nms: bool = False,
        max_det_per_image: int = 100,
        confidence_threshold: float = 0.5,
    ) -> None:
        super().__init__()

        self.model = create_model(
            model_name,
            bench_task='train',
            num_classes=num_classes,
            pretrained=pretrained,
            pretrained_backbone=pretrained_backbone,
            bench_labeler=True,
            soft_nms=soft_nms,
            max_det_per_image=max_det_per_image,
        )
        self.confidence_threshold = confidence_threshold

    def forward(
        self, images: torch.Tensor, targets: List[Dict[str, torch.Tensor]]
    ) -> Union[Dict[str, torch.Tensor], Tuple[List[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]:
        """Pass-through EfficientDet model.

        Two scenarios:
            * In case of training, the model will only output a dictionary of losses with following keys:
                * `class_loss`: Classification loss.
                * `box_loss`: Box regression loss
                * `loss`: Weighted sum of the classification and regression loss.
            * In case of evaluation, the model will output a tuple made of:
                * A list of dictionnaries with predicted `boxes`, `scores` and `labels` tensors for each image
                * A dictionnary of losses (same as in training mode)

        """
        _, _, H, W = images.shape
        # Remap targets to get compliant with effdet library
        new_targets: Dict[str, Any] = {}
        new_targets['bbox'] = [target['boxes'][:, [1, 0, 3, 2]].float() for target in targets]  # xyxy to yxyx
        new_targets['cls'] = [target['labels'].float() for target in targets]
        new_targets['img_size'] = torch.stack([torch.LongTensor([H, W]) for _ in targets], dim=0).to(images.device)
        new_targets['img_scale'] = torch.stack([torch.LongTensor([1]) for _ in targets], dim=0).to(images.device)

        losses_dict = self.model(images, new_targets)
        if self.training is True:
            return losses_dict

        # Remap detections to get compliant with DLCooker
        remap_detections = []
        if 'detections' in losses_dict:
            batch_detections = losses_dict.pop('detections')
            for image_detections in batch_detections:
                # Filter with confidence score
                scores_over_threshold = image_detections[:, 4].float() > self.confidence_threshold
                remap_detections.append(
                    {
                        'boxes': image_detections[scores_over_threshold, :4].float(),
                        'scores': image_detections[scores_over_threshold, 4].float(),
                        'labels': image_detections[scores_over_threshold, 5].long(),
                    }
                )
        return remap_detections, losses_dict

# No weights
model = EfficientDet(num_classes=10, pretrained=False, pretrained_backbone=False)
image = torch.rand(1, 3, 512, 512)
targets = [{'boxes': torch.FloatTensor([[10, 10, 200, 200]]), 'labels': torch.LongTensor([5])}]

# Train mode
model.train()
losses_dict = model(image, targets)

The following error occured (no error with previous version of PyTorch):

Traceback (most recent call last):
  File "test.py", line 95, in <module>
    losses_dict = model(image, targets)
  File "/.../lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "test.py", line 68, in forward
    losses_dict = self.model(images, new_targets)
  File "/.../lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/.../lib/python3.8/site-packages/effdet/bench.py", line 141, in forward
    cls_targets, box_targets, num_positives = self.anchor_labeler.batch_label_anchors(
  File "/...8/lib/python3.8/site-packages/effdet/anchors.py", line 398, in batch_label_anchors
    box_targets[count:count + steps].view([feat_size[0], feat_size[1], -1]))
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

(This concerns the following line of code: https://github.com/rwightman/efficientdet-pytorch/blob/master/effdet/anchors.py#L398)

Replacing the call to .view() with .reshape() or .contiguous().view() seems to work.
Do you encounter the same issue running the previous code with the brand new PyTorch version (1.12)? Any other workaround?

Best regards.

Environment

  • PyTorch 1.12
  • Python 3.8.12

@mayeroa it's not uncommon for this to happen (slight changes in specifics or strictness of memory layout across pt versions), so reshape change is safe and appropriate. I'd accept a PR. Although we should change both view calls there, and also the ones in the fn around line 356. Otherwise I'll get to this later in the week and test pt 1.12 training after I release an updated version of timm (focused on that right now)

@rwightman Thanks for your feedback. I will create a PR right away with the discussed changes.

Dangerous solution.
`It means that torch.reshape may return a copy or a view of the original tensor. You can not count on that to return a view or a copy. According to the developer:

if you need a copy use clone() if you need the same storage use view(). The semantics of reshape() are that it may or may not share the storage and you don't know beforehand.`
https://stackoverflow.com/questions/49643225/whats-the-difference-between-reshape-and-view-in-pytorch

@rwightman I think pip version of effdet v0.3.0 is not updated with the fix yet. Those who installed the effdet package via pip, the error still exists. If possible could you please update the pip package.