MendelXu/SAN

On the issue of self created Coco format datasets

chunguangqu opened this issue · 12 comments

hello,
thanks for your Excellent work,but I have some questions that I need to consult with:
The data I annotate with labelme now has four classes, and then I have converted the data from these four classes to the format of stuffingmaps. However, where do I need to make corresponding modifications when training with my own dataset? Especially for registers_ Coco_ Stuff_ In the 164k.py file, my dataset only has 4 classes and does not have the co stuff label of 91 classes.

You can modify the function

def _get_coco_stuff_meta():

like this.
def _get_voc_meta(cat_list):

Only a list of category names are required.

I will modify the function as follows:
CLASS_NAMES = (
"oil cup",
"liquid oil",
"magnetic flap",
"liquid water",
)

def _get_coco_stuff_meta(cat_list):
ret = {
"stuff_classes": cat_list,
}
return ret

def register_all_coco_stuff_164k(root):
root = os.path.join(root, "coco")
meta = _get_coco_stuff_meta(CLASS_NAMES)

But still reporting such an error:
File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/quchunguang/detectron2/detectron2/engine/launch.py", line 123, in _distributed_worker
main_func(*args)
File "/home/quchunguang/003-large-model/SAN/train_net.py", line 274, in main
return trainer.train()
File "/home/quchunguang/detectron2/detectron2/engine/defaults.py", line 484, in train
super().train(self.start_iter, self.max_iter)
File "/home/quchunguang/detectron2/detectron2/engine/train_loop.py", line 155, in train
self.run_step()
File "/home/quchunguang/detectron2/detectron2/engine/defaults.py", line 494, in run_step
self._trainer.run_step()
File "/home/quchunguang/detectron2/detectron2/engine/train_loop.py", line 492, in run_step
loss_dict = self.model(data)
File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0])
File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/quchunguang/003-large-model/SAN/san/model/san.py", line 206, in forward
losses = self.criterion(outputs, targets)
File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/quchunguang/003-large-model/SAN/san/model/criterion.py", line 234, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/quchunguang/003-large-model/SAN/san/model/matcher.py", line 184, in forward
return self.memory_efficient_forward(outputs, targets)
File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/quchunguang/003-large-model/SAN/san/model/matcher.py", line 127, in memory_efficient_forward
tgt_mask = point_sample(
File "/home/quchunguang/detectron2/projects/PointRend/point_rend/point_features.py", line 39, in point_sample
output = F.grid_sample(input, 2.0 * point_coords - 1.0, **kwargs)
File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/functional.py", line 4223, in grid_sample
return torch.grid_sampler(input, grid, mode_enum, padding_mode_enum, align_corners)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Did you have a deeper look into the error? The problem is triggered by grid_sample, you may carefully check the input.

My dataset consists of 4 classes and does not have a stuff class. I only need to modify the number of categories in the "SAN/san/config. py" file and the "SAN/san/data/datasets/register_coco_suff_164k. py" _ get_ coco_ stuff_meta() function? Do I need to modify the configuration parameters for detecton2 or other files?

I think there should not be other parameters. Is it possible to share me the modified code and the whole training log?

Uploading san-1109.zip…

Attached are the code and dataset I used. Could you please help me identify the issue?

The link seems invalid. It points to current issue.

I sent you the download link for Baidu Netdisk:
链接:https://pan.baidu.com/s/182QmirMpXRqEhIIkAXIM6A?pwd=jow7
提取码:jow7

Sorry for late reply. I think the issue is possibly that you are still using the coco stuff datasets.
In line 212 of san/data/datasets/register_coco_stuff_164k.py, the root path is till pointed to the coco dataset.

I didn't understand what you meant.
As expressed in your code, the training set uses the co stuff format, while the validation set uses co stuff, Pascal VOC-20, Pascal Context-59, and so on. So my training set and validation set are both in Coco Stuff format(the path is :SAN/datasets/coco/stuffthingmaps_detectron2/). It's just that my object category is 4, and stuff category 91 is also 4. So I would like to ask where I need to modify it to ensure normal training?

So are you sure that the data used in the training is correct? Like the category index in the segmentation map is 0, 1, 2, 3. I think the bug is very easy to debug... Just add breakpoint at the line your training raised error and check the data.

Thank you for your patient reply, the problem has been solved