博主,数据集能告诉我下吗!
Closed this issue · 15 comments
数据增强部分对数据集格式没有要求,只要输出是 image, boxes, labels, image_name 就可以。
你可以按照这里写一个。
Changeable/changeable/dataset.py
Line 44 in a2db3be
如果自己用labelimg标注了数据,可以通过 这个函数生成voc格式的数据集。(也提供了yolo格式数据集转换voc格式的函数)
Changeable/changeable/utils/dataset.py
Line 85 in a2db3be
你也可以直接下载voc数据集,具体链接: http://host.robots.ox.ac.uk/pascal/VOC/
博主你好,我在使用的过程中发现,当num_works = 4时会报这样的错误
File "E:/yolox-trans/train.py", line 234, in
fit_one_epoch(model_train, model, yolo_loss, loss_history, optimizer, epoch,
File "E:\yolox-trans\utils\utils_fit.py", line 14, in fit_one_epoch
for iteration, batch in enumerate(gen):
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 435, in next
data = self._next_data()
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 1111, in _process_data
data.reraise()
File "D:\miniconda\envs\py38\lib\site-packages\torch_utils.py", line 428, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 3.
Original Traceback (most recent call last):
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data_utils\worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data_utils\fetch.py", line 47, in fetch
return self.collate_fn(data)
File "D:\miniconda\envs\py38\lib\site-packages\changeable\dataloader.py", line 41, in call
images = default_collate(images)
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data_utils\collate.py", line 63, in default_collate
return default_collate([torch.as_tensor(b) for b in batch])
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data_utils\collate.py", line 55, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: result type Float can't be cast to the desired output type Byte
我看了下你的报错,应该是dataset 的 getitem 中返回的image数据类型不是numpy.ndarray
dataset返回的__getitem__是这个。
Changeable/changeable/dataset.py
Line 44 in 957c17a
其中 image是
Changeable/changeable/dataset.py
Line 76 in 957c17a
@yatengLG 您好我的image输出是adarray格式 ,在使用您给的小demo时转化几张图片过后也会出现这个问题,包括使用num_work=1时也会出现。
我的数据集是自己的不是VOC数据集,但标注方式是一样的。
@liu-ai-z 你是不是跑到一半出问题了。
如果是,那就是数据存在问题, 你把dataloader里面的num_workers 改成1 。
然后去跑,看看是哪个数据有问题。
从你最开始贴的结果里面:
博主你好,我在使用的过程中发现,当num_works = 4时会报这样的错误
File "E:/yolox-trans/train.py", line 234, in
fit_one_epoch(model_train, model, yolo_loss, loss_history, optimizer, epoch,
File "E:\yolox-trans\utils\utils_fit.py", line 14, in fit_one_epoch
for iteration, batch in enumerate(gen):
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 435, in next
data = self._next_data()
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 1111, in _process_data
data.reraise()
File "D:\miniconda\envs\py38\lib\site-packages\torch_utils.py", line 428, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 3.
提示是在第三个进程里面出现问题了,所以我推断你这情况是跑一半出现的,是由于某个数据存在问题导致的。
在你贴的结果中,存在这一行
File "D:\miniconda\envs\py38\lib\site-packages\changeable\dataloader.py", line 41, in call
images = default_collate(images)
应该就是有一个图片数据存在问题,导致的报错。
from changeable.utils.display import draw_boxes, plot_image
from changeable.dataset import VOCDataset
from changeable.dataloader import dataloader
from changeable.transforms import *
from changeable.anchor import AnchorsAssignerWH,AnchorsGenerator
if name == 'main':
with open('pcb_classes.txt', 'r')as f: # 类别名文件,每行一个类别名
lines = f.readlines()
classes_name = tuple([line.rstrip('\n') for line in lines])
dataset = VOCDataset(root='PCB', # voc数据集根目录
classes_name=classes_name,
is_train=True,
transforms=Compose([ # 这里添加了所有的数据增强方式,只做例子演示用。
Resize((300, 300)),
AdaptiveResize((300, 300)),
Scaled(1.1),
CropIou(0.5),
CropSize((300, 300)),
DivideStds((1,1,1)),
SubtractMeans((0,0,0)),
GaussNoise(),
SalePepperNoise(),
GaussBlur(),
MotionBlue(),
Cutout(),
RandomFlipLR(),
RandomFlipUD(),
ShuffleChannels(),
ChangeContrast(),
ChangeHue(),
ChangeBrightness(),
ChangeSaturation(),
ConvertBoxesToPercentage(),
ConvertBoxesToValue(),
ConvertBoxesForm('xyxy', 'cxcywh'),
ConvertBoxesForm('cxcywh', 'xyxy'),
])
)
anchors = AnchorsGenerator(image_size=(600, 600),
feature_maps_size=((76, 76), (38, 38), (19, 19)),
anchors_size=(((10, 13), (16, 30), (33, 23)),
((30, 61), (62, 45), (59, 119)),
((116, 90), (156, 198), (373, 326))),
form='xyxy',
clip=True
)
anchors_assigner = AnchorsAssignerWH(anchors, 3)
loader = dataloader(dataset, batch_size=4, resize=(600, 600), use_mosaic=True, anchors_assigner=anchors_assigner, shuffle=True, num_workers=4)
for i, (img, box, lab, ids) in enumerate(loader):
print(i)
print(img.size())
print(box.size())
print(lab.size())
img, box, lab, id = img[0], box[0], lab[0], ids[0]
img = img.permute((1, 2, 0)).numpy()
box, lab = box.numpy(), lab.numpy()
box, lab = box[lab>0], lab[lab>0]
img = draw_boxes(img, box, lab, label_name=classes_name)
plot_image(img)
快速排查哪个图片存在问题,你可以这么写:
## 将 shuffle =False, 不打乱数据顺序,num_workers = 1 使用一个进程去处理数据。
loader = dataloader(dataset, batch_size=4, resize=(600, 600), use_mosaic=True, anchors_assigner=anchors_assigner, shuffle=False, num_workers=1)
for i, (img, box, lab, ids) in enumerate(loader):
print(ids)
如果是没有问题的数据,就会打印图片名。
直到报错,然后去你数据集的那个训练.txt 找没打印的下一个数据,应该就是那张图片出问题了
好的我试一下
我试了以下发现是使用use_mosaic出现的情况,在不适用use_mosaic是没有任何问题的
联系方式已经发送到了您的邮箱