关于ISIC数据集的训练问题
Opened this issue · 2 comments
Frank-Cai0709 commented
作者您好,我想请教一下:
先按照您给的步骤设置了环境变量,并下载了Google Drive的ISIC2017的raw data以及 preprocessed data,但是训练的时候dsc和miou始终都是0%。
==========num_iterations_per_epoch: 250===========
wandb: Network error (ReadTimeout), entering retry loop.
2024-04-10 16:10:45.853054: finished training epoch 3
2024-04-10 16:10:47.647442: Using splits from existing split file: /media/dell/D/cjt/cjt/unetv2/nnunetv2/data/preprocessed_data/Dataset122_ISIC2017/splits_final.json
2024-04-10 16:10:47.664358: The split file contains 1 splits.
2024-04-10 16:10:47.665259: Desired fold for training: 0
2024-04-10 16:10:47.665558: This split has 1500 training and 650 validation cases.
start computing score....
2024-04-10 16:26:39.131107: dsc: 0.00%
2024-04-10 16:26:39.134617: miou: 0.00%
2024-04-10 16:26:39.135679: acc: 83.25%, sen: 0.00%, spe: 100.00%
2024-04-10 16:26:39.137912: current best miou: 0.0 at epoch: 0, (0, 0.0, 0.0)
2024-04-10 16:26:39.138560: current best dsc: 0.0 at epoch: 0, (0, 0.0, 0.0)
2024-04-10 16:26:39.139042: finished real validation
/media/dell/D/cjt/cjt/unetv2/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py:1115: RuntimeWarning: Mean of empty slice
我对_internal_maybe_mirror_and_predict函数做了一些修改,因为用深度监督的时候,会输出一个tuple,我取了tuple中的第二个作为输出。如下所示,另外我看raw_data的每张mask都是全黑的,应该是作者预处理把255像素值调到了1吧,以上会对训练有影响吗?
def _internal_maybe_mirror_and_predict(self, x: torch.Tensor) -> torch.Tensor:
mirror_axes = self.allowed_mirroring_axes if self.use_mirroring else None
x = x.to(torch.float16)
# prediction = self.network(x)
prediction = self.network(x)[1]
# print("x.shape", x.shape)
if mirror_axes is not None:
# check for invalid numbers in mirror_axes
# x should be 5d for 3d images and 4d for 2d. so the max value of mirror_axes cannot exceed len(x.shape) - 3
assert max(mirror_axes) <= len(x.shape) - 3, 'mirror_axes does not match the dimension of the input!'
num_predictons = 2 ** len(mirror_axes)
if 0 in mirror_axes:
# prediction += torch.flip(self.network(torch.flip(x, (2,))), (2,))
#预测只取最后一个
prediction += torch.flip(self.network(torch.flip(x, (2,)))[1], (2,))
if 1 in mirror_axes:
prediction += torch.flip(self.network(torch.flip(x, (3,)))[1], (3,))
if 2 in mirror_axes:
prediction += torch.flip(self.network(torch.flip(x, (4,)))[1], (4,))
if 0 in mirror_axes and 1 in mirror_axes:
prediction += torch.flip(self.network(torch.flip(x, (2, 3)))[1], (2, 3))
if 0 in mirror_axes and 2 in mirror_axes:
prediction += torch.flip(self.network(torch.flip(x, (2, 4)))[1], (2, 4))
if 1 in mirror_axes and 2 in mirror_axes:
prediction += torch.flip(self.network(torch.flip(x, (3, 4)))[1], (3, 4))
if 0 in mirror_axes and 1 in mirror_axes and 2 in mirror_axes:
prediction += torch.flip(self.network(torch.flip(x, (2, 3, 4)))[1], (2, 3, 4))
prediction /= num_predictons
# if 1 in mirror_axes:
# prediction += torch.flip(self.network(torch.flip(x, (3,))), (3,))
# if 2 in mirror_axes:
# prediction += torch.flip(self.network(torch.flip(x, (4,))), (4,))
# if 0 in mirror_axes and 1 in mirror_axes:
# prediction += torch.flip(self.network(torch.flip(x, (2, 3))), (2, 3))
# if 0 in mirror_axes and 2 in mirror_axes:
# prediction += torch.flip(self.network(torch.flip(x, (2, 4))), (2, 4))
# if 1 in mirror_axes and 2 in mirror_axes:
# prediction += torch.flip(self.network(torch.flip(x, (3, 4))), (3, 4))
# if 0 in mirror_axes and 1 in mirror_axes and 2 in mirror_axes:
# prediction += torch.flip(self.network(torch.flip(x, (2, 3, 4))), (2, 3, 4))
# prediction /= num_predictons
return prediction
Frank-Cai0709 commented
另外,deep_supervision.py里我加了一行target = F.interpolate(target, size=(256, 256), mode='bilinear', align_corners=False)将掩码放大到256,256的尺寸,不然计算out2的loss时,gt只有128,128的尺寸,不过我想这个对训练应该没什么影响吧。
for i, inputs in enumerate(zip(*args)):
if i == 0:
continue
output, target = inputs
target = F.interpolate(target, size=(256, 256), mode='bilinear', align_corners=False)
inputs = output, target
l += weights[i] * self.loss(*inputs)
gt和out尺寸不对应,报错如下:
Traceback (most recent call last):
File "/media/dell/D/cjt/cjt/unetv2/nnunetv2/run/run_training.py", line 311, in <module>
run_training_entry()
File "/media/dell/D/cjt/cjt/unetv2/nnunetv2/run/run_training.py", line 305, in run_training_entry
run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
File "/media/dell/D/cjt/cjt/unetv2/nnunetv2/run/run_training.py", line 230, in run_training
nnunet_trainer.run_training(dataset_id=dataset_id)
File "/media/dell/D/cjt/cjt/unetv2/nnunetv2/training/nnUNetTrainer/ISICTrainer.py", line 145, in run_training
train_outputs.append(self.train_step(next(self.dataloader_train)))
File "/media/dell/D/cjt/cjt/unetv2/nnunetv2/training/nnUNetTrainer/ISICTrainer.py", line 196, in train_step
l = self.loss(output, target)
File "/home/dell/anaconda3/envs/unetv2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/dell/D/cjt/cjt/unetv2/nnunetv2/training/loss/deep_supervision.py", line 41, in forward
l += weights[i] * self.loss(*inputs)
File "/home/dell/anaconda3/envs/unetv2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/dell/D/cjt/cjt/unetv2/nnunetv2/training/loss/compound_losses.py", line 54, in forward
ce_loss = self.ce(net_output, target[:, 0].long()) \
File "/home/dell/anaconda3/envs/unetv2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/dell/D/cjt/cjt/unetv2/nnunetv2/training/loss/robust_ce_loss.py", line 19, in forward
loss = super().forward(input, target.long())
File "/home/dell/anaconda3/envs/unetv2/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1174, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "/home/dell/anaconda3/envs/unetv2/lib/python3.8/site-packages/torch/nn/functional.py", line 3029, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: input and target batch or spatial sizes don't match: target [49, 128, 128], input [49, 1, 256, 256]
yaoppeng commented
我将nnUNet_plans.json
中的默认设置改了下。对于deep_supervision
有以下代码控制:
if self.deep_supervision:
return seg_outs[::-1]
else:
return seg_outs[-1]
训练和测试时label
应该是0, 1, 2
这样的数字。可视化时可以rescale到[0, 255]。请再试试看是否正常。