d-li14/involution

bug

Dontfall opened this issue · 37 comments

作者,您好!
我成功的在yolo中使用了det/mmdet/models/utils/involution_naive.py。但是,在使用involution_cuda.py 的过程中碰到了麻烦。
我解决了上一个问题 https://github.com/d-li14/involution/issues/15,但是遇到了新的问题,问题如下:
from n params module arguments
0 -1 1 3520 models.common.Focus [3, 32, 3]
1 -1 1 1154 models.experimental.involution [32, 7, 2]
2 -1 1 16768 models.common.C3 [32, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 156928 models.common.C3 [128, 128, 3]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 1 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 656896 models.common.SPP [512, 512, [5, 9, 13]]
9 -1 1 1182720 models.common.C3 [512, 512, 1, False]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 361984 models.common.C3 [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 90880 models.common.C3 [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 296448 models.common.C3 [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1182720 models.common.C3 [512, 512, 1, False]
24 [17, 20, 23] 1 229245 models.yolo.Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 288 layers, 7257151 parameters, 7257151 gradients, 16.1 GFLOPS

Scaled weight_decay = 0.0005
Optimizer groups: 63 .bias, 63 conv.weight, 59 other
train: Scanning '../coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|████| 128/128 [00:00<?, ?it/sPlotting labels... coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|██████| 128/128 [00:00<?, ?it/s]
val: Scanning '../coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|██████| 128/128 [00:00<?, ?it/s]
train: Scanning '../coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|████| 128/128 [00:00<?, ?it/s]
val: Scanning '../coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|██████| 128/128 [00:00<?, ?it/s]
val: Scanning '../coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|██████| 128/128 [00:00<?, ?it/s]

autoanchor: Analyzing anchors... anchors/target = 4.26, Best Possible Recall (BPR) = 0.9946
Image sizes 640 train, 640 test
Using 2 dataloader workers
Logging results to runs/train/exp22
Starting training for 300 epochs...

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size

0%| | 0/64 [00:05<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 532, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 297, in train
pred = model(imgs) # forward
File "/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 161, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
TypeError: init() missing 3 required positional arguments: 'source', 'name', and 'options'

希望能得到您解决此问题的建议,谢谢!

我的环境是pytorch1.7.0-gpu,mmcv-full-1.2.7,cuda 10.2

作者,您好!
我成功的在yolo中使用了det/mmdet/中使用了/utils/inconcription_nayve.py。但是,在使用对合cuda.py的过程中碰到了麻烦.
我解决了上一个问题 Https://github.com/d-li14/involution/issues/15,但是遇到了新的问题,问题如下:
从n个仿射模自变量
0-1,1,3520型号.公共.焦点[3,32,3]
1-1 1154模型.试验.对合[32,7,2]
2-11 16768型号.公共.C3[32,64,1]
3-11 73984型号.通用.Conv[64,128,3,2]
4-11 156928型号.公共.C3[128,128,3]
5-1 1 295424型号通用.Conv[128,256,3,2]
6-1 1 625152型号.公共.C3[256,256,3]
7-1 1 1180672型号.通用.Conv[256,512,3,2]
8-11 656896型号.通用.spp[512,512,[5,9,13]
9-1 1 1182720型号.Common.C3[512,512,1,false]
10-1 1 131584型号.通用.Conv[512,256,1,1]
11-110 Torch.nn.Modes.upsampl.Upsampl.UpSample[无,2,‘最近’]
12[-1,6]10型号.通用.Concat[1]
13-11 361984型号.公共.C3[512,256,1,假]
14-1 1 33024型号.通用.Conv[256,128,1,1]
15-10 Torch.nn.Modes.upsampl.Upsampl.UpSample[无,2,‘最近’]
16[-1,4]10型号.通用.Concat[1]
17-1 1 90880型号.公共.C3[256,128,1,假]
18-1 1 147712型号.通用.Conv[128,128,3,2]
19[-1,14]10种型号.通用.Concat[1]
20-1 1 296448型号.公共.C3[256,256,1,假]
21-1 1 590336型号.通用.Conv[256,256,3,2]
22[-1,10]10型号.通用.Concat[1]
23-1 1 1182720型号.Common.C3[512,512,1,false]
24[17,20,23]1 229245型号s.yolo.detect[80,[10,13,16,30,33,23],[30,61,62,45,59,119],[116,90,156,198,373,326],[128,256,512]]
模型摘要:288个层,7257151个参数,7257151个梯度,16.1个GFLOPS

标度重量衰减=0.0005
优化组:63.偏差,63卷积.权重,59其他
列车:扫描‘./可可128/标签/火车2017.cache’的图像和标签.找到,0缺失,2空,0损坏:100%████128/128[00:00<?,it/sPlotting标签.可可128/标签/火车2017.cache‘的图像和标签.找到,0缺失,2空,0损坏:100%█128/128[00:00<?,it/s]
Val:扫描图像和标签的‘../coo 128/标签/列车2017.cache’.找到,0缺失,2空,0损坏:100%█128/128[00:00<?,it/s]
列车:扫描‘./可可128/标签/火车2017.cache’的图像和标签.找到,0缺失,2空,0损坏:100%████128/128[00:00<?,it/s]
Val:扫描图像和标签的‘../coo 128/标签/列车2017.cache’.找到,0缺失,2空,0损坏:100%█128/128[00:00<?,it/s]
Val:扫描图像和标签的‘../coo 128/标签/列车2017.cache’.找到,0缺失,2空,0损坏:100%█128/128[00:00<?,it/s]

分析锚..。锚/目标=4.26,最佳可能召回(BPR)=0.9946
图像尺寸640列车,640测试
使用2个数据处理程序工作人员
记录运行/列车/费用的结果22
开始训练300次.

GPU_mem盒OBJ CLS总目标IMG_size

0%0/64[00:05<?,it/s]
追溯(最近一次调用):
文件“tra.py”,第532行,在
列车(Hyp,OPT,设备,TB_WANDER,WANDEB)
文件“tra.py”,第297行,在火车上
Pred=模型(IMGS)#前进
文件“/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/nn/modules/module.py”,第727行,in_Call_impl
结果=自我.前进(*输入,**kwargs)
文件“/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py”,第161行,向前
输出=自。并行_应用(副本、输入、kwargs)
文件“/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py”,第171行,并行应用
返回ParallyApply(副本、输入、kwargs、Self.Device_ID[:Len(副本)])
文件“/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py”,第86行,并行应用
Output.reraise()
文件“/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/_utils.py”,第428行,重新排列
提升.exc_type(Msg)
TypeError:依尼特()缺少3个所需的位置参数:“源”、“名称”和“选项”

希望能得到您解决此问题的建议,谢谢!

我也是加到yolo,但是我loss显示为nan,请问我该怎么解决

作者,您好!
我成功的在yolo中使用了det/mmdet/models/utils/involution_naive.py。但是,在使用involution_cuda.py 的过程中碰到了麻烦。
我解决了上一个问题 https://github.com/d-li14/involution/issues/15,但是遇到了新的问题,问题如下:
from n params module arguments
0 -1 1 3520 models.common.Focus [3, 32, 3]
1 -1 1 1154 models.experimental.involution [32, 7, 2]
2 -1 1 16768 models.common.C3 [32, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 156928 models.common.C3 [128, 128, 3]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 1 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 656896 models.common.SPP [512, 512, [5, 9, 13]]
9 -1 1 1182720 models.common.C3 [512, 512, 1, False]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 361984 models.common.C3 [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 90880 models.common.C3 [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 296448 models.common.C3 [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1182720 models.common.C3 [512, 512, 1, False]
24 [17, 20, 23] 1 229245 models.yolo.Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 288 layers, 7257151 parameters, 7257151 gradients, 16.1 GFLOPS

Scaled weight_decay = 0.0005
Optimizer groups: 63 .bias, 63 conv.weight, 59 other
train: Scanning '../coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|████| 128/128 [00:00<?, ?it/sPlotting labels... coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|██████| 128/128 [00:00<?, ?it/s]
val: Scanning '../coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|██████| 128/128 [00:00<?, ?it/s]
train: Scanning '../coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|████| 128/128 [00:00<?, ?it/s]
val: Scanning '../coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|██████| 128/128 [00:00<?, ?it/s]
val: Scanning '../coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|██████| 128/128 [00:00<?, ?it/s]

autoanchor: Analyzing anchors... anchors/target = 4.26, Best Possible Recall (BPR) = 0.9946
Image sizes 640 train, 640 test
Using 2 dataloader workers
Logging results to runs/train/exp22
Starting training for 300 epochs...

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size

0%| | 0/64 [00:05<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 532, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 297, in train
pred = model(imgs) # forward
File "/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 161, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
TypeError: init() missing 3 required positional arguments: 'source', 'name', and 'options'

希望能得到您解决此问题的建议,谢谢!
@Dontfall 您好!我用involution替换YOLO中的conv模块,用了involution_naive.py,一直报错,想问一下还需要involution项目的什么文件吗?
File "D:\win10 pytorch17\yolov5-master1\utils\involution.py", line 21, in init
self.conv1 = ConvModule(
TypeError: init() got an unexpected keyword argument 'k'

@Dontfall 嗯,谢谢,但是我的involution_naive。py中输出的通道为54,但是对接YOLO中对应的通道是32 ,出现不一致情况,请问一般怎么解决

@Dontfall 如果我用involution_cuda.py也是只用这个文件就可以是吗

@Dontfall 您好,我这边一直出问题,不知道怎么处理,可以请教一下不
out = self.unfold(x).view(b, self.g, self.gc, self.k ** 2, h, w)
RuntimeError: shape '[1, 2, 24, 9, 128, 128]' is invalid for input of size 1769472

@Dontfall mmcv库我安装了

@Dontfall 安装的mmcv1.3.1的

@Dontfall 我看了一下 不太明白下面这个是放在哪里
import torch
from models.experimental import Involution
m = Involution(ch=128, k=1, s=1) # ch_in, kernel, stride
x = torch.zeros(16, 128, 20, 20) # input
y = m(x) # forward

@Dontfall 嗯嗯,我看到这个项目了,我是想问一下,这里面experiental.py文件中创建了Involution类,但如果想用involution替换掉对应模块,把下面的替换进去是吗

m = Involution(ch=128, k=1, s=1) # ch_in, kernel, stride

x = torch.zeros(16, 128, 20, 20) # input

y = m(x) # forward

@Dontfall 嗯嗯,谢谢您,我在.yaml文件中修改了对应的部分,对应的在common.py文件中调用Involution类,但是报错
Traceback (most recent call last):
File "D:/win10 pytorch17/yolov5-involution/train.py", line 531, in
train(hyp, opt, device, tb_writer, wandb)
File "D:/win10 pytorch17/yolov5-involution/train.py", line 78, in train
model = Model(opt.cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device) # create
File "D:\win10 pytorch17\yolov5-involution\models\yolo.py", line 83, in init
self.model, self.save = parse_model(deepcopy(self.yaml), ch=[ch]) # model, savelist
File "D:\win10 pytorch17\yolov5-involution\models\yolo.py", line 240, in parse_model
m_ = nn.Sequential(*[m(*args) for _ in range(n)]) if n > 1 else m(*args) # module
File "D:\win10 pytorch17\yolov5-involution\models\common.py", line 118, in init
self.conv = Involution(c1 * 4, k, s)
NameError: name 'Involution' is not defined

@Dontfall 是对应的

@Dontfall 您好,我检查了几遍没有存在不对应的问题,我觉得应该是我代码哪里的问题:假设我用involution替换focus模块中的Conv模块,做如下修改对吗?
1..yaml文件中:
backbone:

[from, number, module, args]

[[-1, 1, Focus, [64, 3]], # 0-P1/2
#[-1, 1, Conv, [128, 3, 2]], # 1-P2/4
[-1, 1, Conv, [128, 1, 1]],

common.py文件中
lass Focus(nn.Module):
# Focus wh information into c-space
def init(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, kernel, stride, padding, groups
super(Focus, self).init()
# self.conv = Involution(c1, k, s)
#self.conv = Conv(c1 * 4, c2, k, s, p, g, act)
self.conv = Involution(c1 * 4, k, s)

除此之外还需要修改哪里吗?

您好,请问yolo.py里要怎么改呢,跑yolov5一直没跑通

@luyunfighting 你好,我之前跑通了cpu的版本,cuda的版本没有跑通,所以放弃了,祝你成功!