关于ISIC数据集的训练问题

Question

关于ISIC数据集的训练问题

Opened this issue 6 months ago · 2 comments

作者您好，我想请教一下：
先按照您给的步骤设置了环境变量，并下载了Google Drive的ISIC2017的raw data以及 preprocessed data，但是训练的时候dsc和miou始终都是0%。

==========num_iterations_per_epoch: 250===========
wandb: Network error (ReadTimeout), entering retry loop.
2024-04-10 16:10:45.853054: finished training epoch 3
2024-04-10 16:10:47.647442: Using splits from existing split file: /media/dell/D/cjt/cjt/unetv2/nnunetv2/data/preprocessed_data/Dataset122_ISIC2017/splits_final.json
2024-04-10 16:10:47.664358: The split file contains 1 splits.
2024-04-10 16:10:47.665259: Desired fold for training: 0
2024-04-10 16:10:47.665558: This split has 1500 training and 650 validation cases.
start computing score....
2024-04-10 16:26:39.131107: dsc: 0.00%
2024-04-10 16:26:39.134617: miou: 0.00%
2024-04-10 16:26:39.135679: acc: 83.25%, sen: 0.00%, spe: 100.00%
2024-04-10 16:26:39.137912: current best miou: 0.0 at epoch: 0, (0, 0.0, 0.0)
2024-04-10 16:26:39.138560: current best dsc: 0.0 at epoch: 0, (0, 0.0, 0.0)
2024-04-10 16:26:39.139042: finished real validation
/media/dell/D/cjt/cjt/unetv2/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py:1115: RuntimeWarning: Mean of empty slice

我对_internal_maybe_mirror_and_predict函数做了一些修改，因为用深度监督的时候，会输出一个tuple，我取了tuple中的第二个作为输出。如下所示，另外我看raw_data的每张mask都是全黑的，应该是作者预处理把255像素值调到了1吧，以上会对训练有影响吗？

def _internal_maybe_mirror_and_predict(self, x: torch.Tensor) -> torch.Tensor:
    mirror_axes = self.allowed_mirroring_axes if self.use_mirroring else None
    x = x.to(torch.float16)
    # prediction = self.network(x)
    prediction = self.network(x)[1]
    # print("x.shape", x.shape)

    if mirror_axes is not None:
        # check for invalid numbers in mirror_axes
        # x should be 5d for 3d images and 4d for 2d. so the max value of mirror_axes cannot exceed len(x.shape) - 3
        assert max(mirror_axes) <= len(x.shape) - 3, 'mirror_axes does not match the dimension of the input!'

        num_predictons = 2 ** len(mirror_axes)
        if 0 in mirror_axes:
            # prediction += torch.flip(self.network(torch.flip(x, (2,))), (2,))
            #预测只取最后一个
            prediction += torch.flip(self.network(torch.flip(x, (2,)))[1], (2,))
        if 1 in mirror_axes:
            prediction += torch.flip(self.network(torch.flip(x, (3,)))[1], (3,))
        if 2 in mirror_axes:
            prediction += torch.flip(self.network(torch.flip(x, (4,)))[1], (4,))
        if 0 in mirror_axes and 1 in mirror_axes:
            prediction += torch.flip(self.network(torch.flip(x, (2, 3)))[1], (2, 3))
        if 0 in mirror_axes and 2 in mirror_axes:
            prediction += torch.flip(self.network(torch.flip(x, (2, 4)))[1], (2, 4))
        if 1 in mirror_axes and 2 in mirror_axes:
            prediction += torch.flip(self.network(torch.flip(x, (3, 4)))[1], (3, 4))
        if 0 in mirror_axes and 1 in mirror_axes and 2 in mirror_axes:
            prediction += torch.flip(self.network(torch.flip(x, (2, 3, 4)))[1], (2, 3, 4))
        prediction /= num_predictons
        # if 1 in mirror_axes:
        #     prediction += torch.flip(self.network(torch.flip(x, (3,))), (3,))
        # if 2 in mirror_axes:
        #     prediction += torch.flip(self.network(torch.flip(x, (4,))), (4,))
        # if 0 in mirror_axes and 1 in mirror_axes:
        #     prediction += torch.flip(self.network(torch.flip(x, (2, 3))), (2, 3))
        # if 0 in mirror_axes and 2 in mirror_axes:
        #     prediction += torch.flip(self.network(torch.flip(x, (2, 4))), (2, 4))
        # if 1 in mirror_axes and 2 in mirror_axes:
        #     prediction += torch.flip(self.network(torch.flip(x, (3, 4))), (3, 4))
        # if 0 in mirror_axes and 1 in mirror_axes and 2 in mirror_axes:
        #     prediction += torch.flip(self.network(torch.flip(x, (2, 3, 4))), (2, 3, 4))
        # prediction /= num_predictons
    return prediction

Answer 1 · 2024-04-10T09:09:51.000Z

另外，deep_supervision.py里我加了一行target = F.interpolate(target, size=(256, 256), mode='bilinear', align_corners=False)将掩码放大到256,256的尺寸，不然计算out2的loss时，gt只有128,128的尺寸，不过我想这个对训练应该没什么影响吧。

     for i, inputs in enumerate(zip(*args)):
                    if i == 0:
                        continue
                    output, target = inputs
                    target = F.interpolate(target, size=(256, 256), mode='bilinear', align_corners=False)
                    inputs = output, target
                    l += weights[i] * self.loss(*inputs)

gt和out尺寸不对应，报错如下：

Traceback (most recent call last):
  File "/media/dell/D/cjt/cjt/unetv2/nnunetv2/run/run_training.py", line 311, in <module>
    run_training_entry()
  File "/media/dell/D/cjt/cjt/unetv2/nnunetv2/run/run_training.py", line 305, in run_training_entry
    run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
  File "/media/dell/D/cjt/cjt/unetv2/nnunetv2/run/run_training.py", line 230, in run_training
    nnunet_trainer.run_training(dataset_id=dataset_id)
  File "/media/dell/D/cjt/cjt/unetv2/nnunetv2/training/nnUNetTrainer/ISICTrainer.py", line 145, in run_training
    train_outputs.append(self.train_step(next(self.dataloader_train)))
  File "/media/dell/D/cjt/cjt/unetv2/nnunetv2/training/nnUNetTrainer/ISICTrainer.py", line 196, in train_step
    l = self.loss(output, target)
  File "/home/dell/anaconda3/envs/unetv2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/dell/D/cjt/cjt/unetv2/nnunetv2/training/loss/deep_supervision.py", line 41, in forward
    l += weights[i] * self.loss(*inputs)
  File "/home/dell/anaconda3/envs/unetv2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/dell/D/cjt/cjt/unetv2/nnunetv2/training/loss/compound_losses.py", line 54, in forward
    ce_loss = self.ce(net_output, target[:, 0].long()) \
  File "/home/dell/anaconda3/envs/unetv2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/dell/D/cjt/cjt/unetv2/nnunetv2/training/loss/robust_ce_loss.py", line 19, in forward
    loss = super().forward(input, target.long())
  File "/home/dell/anaconda3/envs/unetv2/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1174, in forward
    return F.cross_entropy(input, target, weight=self.weight,
  File "/home/dell/anaconda3/envs/unetv2/lib/python3.8/site-packages/torch/nn/functional.py", line 3029, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: input and target batch or spatial sizes don't match: target [49, 128, 128], input [49, 1, 256, 256]

Answer 2 · 2024-04-11T03:21:01.000Z

我将nnUNet_plans.json中的默认设置改了下。对于deep_supervision有以下代码控制:

if self.deep_supervision:
    return seg_outs[::-1]
else:
    return seg_outs[-1]

训练和测试时label应该是0, 1, 2这样的数字。可视化时可以rescale到[0, 255]。请再试试看是否正常。