为什么训练自己的数据按照您这边给的代码loss都2.x...
TonyZPW opened this issue · 9 comments
loss开始2.多 然后一直在2.多这块。。。
要不您先试试有效果没,因为这里的Loss是2*BCELoss+1*DiceLoss+1*LovaszHingeLoss
,所以2.x看起来不大,可以参考这个项目:
这里用的训练好的SegFormer_B2,使用了相同的Loss,可以看到计算出的Loss如下:
Tensor(shape=[1], dtype=float32, place=CUDAPlace(0), stop_gradient=False,
[3.13252783])
哥,我用aistudio重新按照你这个代码重新训练了一个SegFormer模型
2022-02-14 17:23:52 [INFO] [TRAIN] epoch: 1, iter: 22420/49039, loss: 1.9413, lr: 0.000008, batch_cost: 0.9691, reader_cost: 0.00017, ips: 16.5098 samples/sec | ETA 07:09:57
2022-02-14 16:38:31 [INFO] [EVAL] #Images: 56046 mIoU: 0.4901 Acc: 0.8622 Kappa: 0.8327 Dice: 0.5694
2022-02-14 16:38:31 [INFO] [EVAL] Class IoU:
[0.8606 0.1196]
2022-02-14 16:38:31 [INFO] [EVAL] Class Acc:
[0.9991 0.1202]
结果预测的时候成这样了。。。
#训练集是这么定义的:
import os
import random
def create_list(data_path):
image_path = os.path.join(data_path, 'img')
label_path = os.path.join(data_path, 'gt')
data_names = os.listdir(image_path)
random.shuffle(data_names) # Scramble data
with open(os.path.join(data_path, 'train_list.txt'), 'w') as tf:
with open(os.path.join(data_path, 'val_list.txt'), 'w') as vf:
for idx, data_name in enumerate(data_names):
img = os.path.join(image_path, data_name)
lab = os.path.join(label_path, data_name.replace('jpg', 'png'))
if idx % 8 == 0: # 80% as the training set
vf.write(img + ' ' + lab + '\n')
else:
tf.write(img + ' ' + lab + '\n')
print('Data list generation completed')
create_list(data_path)
=============================
我的模型定义是这个:
base_lr = 3e-5
epochs = 2
batch_size = 16
iters = epochs * len(train_dataset) // batch_size
model = SegFormer_B2(num_classes=2)
lr = paddle.optimizer.lr.PolynomialDecay(base_lr, decay_steps=iters // epochs, end_lr=base_lr / 5)
optimizer = paddle.optimizer.AdamW(lr, beta1=0.9, beta2=0.999, weight_decay=0.01, parameters=model.parameters())
losses = {}
losses["types"] = [MixedLoss([BCELoss(), DiceLoss(), LovaszHingeLoss()], [2, 1, 1])] # * 2
losses["coef"] = [1]
========================
transforms是这个:
Build the training set
train_transforms = [
T.RandomHorizontalFlip(),
T.RandomVerticalFlip(),
T.RandomRotation(),
T.RandomScaleAspect(),
T.RandomBlur(),
T.Resize(target_size=(512, 512)),
T.Normalize()
]
train_dataset = Dataset(
transforms=train_transforms,
dataset_root=data_path,
num_classes=2,
mode='train',
train_path=train_path,
separator=' '
)
Build validation set
val_transforms = [
T.Resize(target_size=(512, 512)),
T.Normalize()
]
val_dataset = Dataset(
transforms=val_transforms,
dataset_root=data_path,
num_classes=2,
mode='val',
val_path=val_path,
separator=' '
)
请教下哪里出了问题导致的呢? 我看过您这边挺多的项目的,小训练集很容易就能拟合的不错。但是那个40多w的训练集,我自己训练的和您这边训练的为啥差距那么大..
哦 我看到您这边有个Data filtering 这一步骤,是不是缺少了这一步导致的。。
The first 2 steps need to be performed in AI studio. Refer to AI studio project. In local, you only need to prepare the data and start from step 3, label.py is located in the trained/tools folder. 这个第三步需要的label.py文件没找到,能单独发我下吗? 我的email是 zpw19890115@126.com. 谢谢
The first 2 steps need to be performed in AI studio. Refer to AI studio project. In local, you only need to prepare the data and start from step 3, label.py is located in the trained/tools folder. 这个第三步需要的label.py文件没找到,能单独发我下吗? 我的email是 zpw19890115@126.com. 谢谢
不好意思这几日外出,身边没电脑,我能否在4-5天后与你详细,谢谢~
https://aistudio.baidu.com/aistudio/datasetdetail/102929/0 试试现在的数据集,不用进行那么多处理了。之前的问题就是标签应该是0-1而不是0-255