Megvii-BaseDetection/cvpods

[BUG] can not train yolov3

lucasjinreal opened this issue · 3 comments

Hi, not sure its a bug or something missed on my part. But I have a question about yolov3 preprocess_image.

Here is error I got:

   mask[b, a, gj, gi] = 1
IndexError: index 23 is out of bounds for dimension 3 with size 14

this means the label (wh) doesn't same as my image resized, see your image resized to 544 and stride is 32, then your label max would not exceed 17, but I got some index like 23.

And this prerocess_image code:

def preprocess_image(self, batched_inputs, training):
        """
        Normalize, pad and batch the input images.
        """
        images = [x["image"].to(self.device) for x in batched_inputs]
        bs = len(images)
        images = [self.normalizer(x) for x in images]

        images = ImageList.from_tensors(
            images, size_divisibility=0, pad_ref_long=True)
        logger.info('images ori shape: {}'.format(images.tensor.shape))
        logger.info('images ori shape: {}'.format(images.image_sizes))

        # sync image size for all gpus
        comm.synchronize()
        if training and self.iter % self.change_iter == 0:
            if self.iter < self.max_iter - 20000:
                meg = torch.LongTensor(1).to(self.device)
                comm.synchronize()
                if comm.is_main_process():
                    size = np.random.choice(self.multi_size)
                    meg.fill_(size)

                if comm.get_world_size() > 1:
                    comm.synchronize()
                    dist.broadcast(meg, 0)
                self.size = meg.item()

                comm.synchronize()
            else:
                self.size = 608

        if training:
            # resize image inputs
            modes = ['bilinear', 'nearest', 'bicubic', 'area']
            mode = modes[random.randrange(4)]
            if mode == 'bilinear' or mode == 'bicubic':
                images.tensor = F.interpolate(
                    images.tensor, size=[self.size, self.size], mode=mode, align_corners=False)
            else:
                images.tensor = F.interpolate(
                    images.tensor, size=[self.size, self.size], mode=mode)

            if "instances" in batched_inputs[0]:
                gt_instances = [
                    x["instances"].to(self.device) for x in batched_inputs
                ]
            elif "targets" in batched_inputs[0]:
                log_first_n(
                    logging.WARN,
                    "'targets' in the model inputs is now renamed to 'instances'!",
                    n=10)
                gt_instances = [
                    x["targets"].to(self.device) for x in batched_inputs
                ]
            else:
                gt_instances = None

            targets = [
                torch.cat(
                    [instance.gt_classes.float().unsqueeze(-1), instance.gt_boxes.tensor], dim=-1
                )
                for instance in gt_instances
            ]
            labels = torch.zeros((bs, 100, 5))
            for i, target in enumerate(targets):
                labels[i][:target.shape[0]] = target
            labels[:, :, 1:] = labels[:, :, 1:] / 512. * self.size
        else:
            labels = None

        self.iter += 1
        return images, labels

The image resized 2 times, but the label seems doesn't have any changes. Any idea how the error get ? (Maybe your code have some automatically way to solve image and labels, but I don't know where)

It's a yolov3 on coco dataset or on your own dataset?

coco. I found it hard to converge on coco as well, I changed to retangle input size rather than force resize. Also your code have a bug in build_target which make w and h oppsite.

Thanks for your report, we will try in our inner version.