mindspore-lab/mindyolo

A bug for yoloV5&yoloV8 in Ascend Training, seems like mindspore code issue:For Stack, all types should be same, but got (mindspore.tensor[float32], mindspore.tensor[float32], mindspore.tensor[float32], mindspore.float32)

doubtfire009 opened this issue · 4 comments

我在使用昇腾Ascend: 8*ascend-snt9b测试自定义数据集的detection任务,使用yolov5及yoloV8。

按照https://github.com/mindspore-lab/mindyolo/blob/master/examples/finetune_car_detection/README.md 生成的数据集,需要图像和label一一对应,数量必须一致。

在训练过程中,发现一个error,
Traceback (most recent call last):
File "/home/ma-user/work/mindyolo/train.py", line 320, in
train(args)
File "/home/ma-user/work/mindyolo/train.py", line 275, in train
trainer.train(
File "/home/ma-user/work/mindyolo/mindyolo/utils/trainer_factory.py", line 170, in train
run_context.loss, run_context.lr = self.train_step(imgs, labels, segments,
File "/home/ma-user/work/mindyolo/mindyolo/utils/trainer_factory.py", line 366, in train_step
loss, loss_item, _, grads_finite = self.train_step_fn(imgs, labels, True)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/common/api.py", line 718, in staging_specialize
out = _MindsporeFunctionExecutor(func, hash_obj, input_signature, process_obj, jit_config)(*args, **kwargs)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/common/api.py", line 121, in wrapper
results = fn(*arg, **kwargs)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/common/api.py", line 350, in call
raise err
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/common/api.py", line 344, in call
phase = self.compile(self.fn.name, *args_list, **kwargs)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/common/api.py", line 435, in compile
is_compile = self._graph_executor.compile(self.fn, compile_args, kwargs, phase, True)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/ops/operations/array_ops.py", line 3036, in infer
all_shape = _get_stack_shape(value, x_shape, x_type, self.axis, self.name)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/ops/operations/array_ops.py", line 2954, in _get_stack_shape
raise TypeError(f"For {prim_name}, all types should be same, but got {x_type}")
TypeError: For Stack, all types should be same, but got (mindspore.tensor[float32], mindspore.tensor[float32], mindspore.tensor[float32], mindspore.float32)

打了日志查看输入数据类型,发现输入并没有问题,所以请昇腾的技术同事帮忙检查有什么问题?

3d103f96df5920131043c6fd785591d

训练命令是:
python train.py --config ./configs/yolov5/yolov5_frame.yaml --device_target Ascend --ms_mode 1

mindyolo套件目前只支持graph模式训练,可尝试删除参数--ms_mode 1

image
不加ms_mode,也出现了这个问题。另外,我在utils.py里面把
image
这个zhu注释了,因为https://gitee.com/mindspore/mindspore/issues/IAEDX0 里面讲了jit仅支持MindSpore-2.3以上版本

如果不注释,就会出现这个报错:
image

我自己又测试了一下,发现问题出在:1. 需要使用匹配的环境和mindyolo版本。我使用了mindyolo0.2.0和 mindspore_2.1.0-cann_6.3.2-py_3.7-euler_2.8.3-aarch64-d910 2. image 需要按照截图增加一句代码,就可以跑通。

image

该问题经过自己的调整解决