cwq159/PyTorch-Spiking-YOLOv3

question about ann_to_snn generating snn_dag

shirleyatgithub opened this issue · 25 comments

Dear Author,
Thanks for sharing the code. I encountered a problem when executing the ann_to_snn.py and wonder if you have encountered this problem.
The error message is as follows:
"
File "/home/gss/PyTorch-Spiking-YOLOv3-main/ann_parser.py", line 102, in relu_wrapper
in_nodes = [find_node_by_tensor(inp)]
File "/home/gss/PyTorch-Spiking-YOLOv3-main/ann_parser.py", line 37, in find_node_by_tensor
raise ValueError("cannot find tensor Size", tensor.size())
ValueError: ('cannot find tensor Size', torch.Size([1, 16, 416, 416]))
"
In ann_parser.py, the find_node_by_tensor requires "v is tensor", in python this means their memory are the same, but when adding ReLU layer, the input of ReLU cannot meet this condition and the rst is empty.
I print the id of the tensors in this function and got the following messages:
conv1 inp id 140221914107048
find node by tensor dag_input0 torch.Size([1, 3, 416, 416]) torch.Size([1, 3, 416, 416]) 140221914107048 140221914107048
add node conv1: ['dag_input0']->['conv1_out1']
conv1 out id 140221914107336
find node by tensor dag_input0 torch.Size([1, 16, 416, 416]) torch.Size([1, 3, 416, 416]) 140221914107336 140221914107048
find node by tensor conv1_out1 torch.Size([1, 16, 416, 416]) torch.Size([1, 16, 416, 416]) 140221914107336 140221914107336
batch_norm1 inp id 140221914107336
batch_norm1 out id 140221914155120
relu1 inp id 140221914106976
find node by tensor dag_input0 torch.Size([1, 16, 416, 416]) torch.Size([1, 3, 416, 416]) 140221914106976 140221914107048
find node by tensor conv1_out1 torch.Size([1, 16, 416, 416]) torch.Size([1, 16, 416, 416]) 140221914106976 140221914155120

I don't understand why the id will be different in the flow. I only change the classses from 80 to 1 and the filters from 255 to 18 accordingly in the config file "yolov3-tiny-mp2conv-mp1none-lk2relu-up2tconv.cfg". The ANN trained with the config file can be trained and tested successfully.
Looking forward to your reply. @cwq159

@shirleyatgithub
Hello, Which version Pytorch is using.

$ python ann_to_snn.py --cfg cfg/yolov3-tiny.cfg --data data/coco.data --weights weights/best.pt --timesteps 128

I encountered this problem: "ValueError: ('cannot find tensor Size', torch.Size([16, 16, 320, 320])) "
And I can't find the version of pytorch = 1.3.0
Do you have this problem? Looking forward to your reply.

@buaa-luzhi yes, seems the same problem, I use torch 1.7.1. any idea of solving this problem?

@shirleyatgithub
#5
But, I couldn't find a version of Pytorch=1.3.

@buaa-luzhi why using pytorch=1.3, the requirements.txt suggests torch>=1.6.0

@shirleyatgithub
I don't know.
#5
I referred to this link.

@shirleyatgithub
I used Pytorch=1.7.1 and 1.4 and still had this problem.

@buaa-luzhi I didn't find torch 1.3 either so I tried torch 1.4 cpu and python 3.7, this problem is didn't pop out but another problem pops out.
ann_parser.py", line 221, in parse_ann_model
model(*warpped_input)
File "/home/gss/anaconda3/envs/nlp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "ann_to_snn.py", line 65, in forward
x = self.listi
File "/home/gss/anaconda3/envs/nlp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'out'

@shirleyatgithub
pip install torch==1.3.1+cu100 torchvision==0.4.2+cu100 -f https://download.pytorch.org/whl/torch_stable.html

@shirleyatgithub
I'm still testing.

@shirleyatgithub
I still get this error!
I don't know how to modify.

@shirleyatgithub pip install torch==1.3.1+cu100 torchvision==0.4.2+cu100 -f https://download.pytorch.org/whl/torch_stable.html

Thank you, I will try too

Please use pytorch1.3 with python 3.7 in this version.
New version with pytorch1.7+ will be released soon.

@cwq159 @shirleyatgithub
I used pytorch1.3 and python 3.7 and still get this error.
I wonder if /cfg/yolov3-tiny-mp2conv-mp1none-lk2relu-up2tconv.cfg should be used during the training phase.
Because I didn't find/CFG /yolov3-tiny-ours.cfg
Thanks so much, and looking forward to your reply!

@cwq159 @shirleyatgithub
(1) The stage of training:
python3 train.py --batch-size 32 --cfg cfg/yolov3-tiny-mp2conv-mp1none-lk2relu-up2tconv.cfg --data data/coco.data --weights ''
(2)Transform
python3 ann_to_snn.py --cfg cfg/yolov3-tiny-mp2conv-mp1none-lk2relu-up2tconv.cfg --data data/coco.data --weights weights/best.pt --timesteps 128

What's wrong with this type of training?
Error reappears....
ValueError: ('cannot find tensor Size', torch.Size([16, 16, 640, 640]))

Thanks so much, and looking forward to your reply!

Now that error doesn't exist.
However, as timeSteps get larger, a memory error occurs.
GPU memory is only 6GB, batch_size is 1, timesteps=32,
Still display GPU memory error.

@cwq159
Hello, when will Python=1.7 be released?
Thanks!

If you want to enlarge timesteps, you should use one GPU with enough memory.
Because in this version, input data will be copied for timesteps times. Then snn will calculate the output for every copy. So the GPU memory should be large enough.
New version will try to optimize the IF operation to decrease the memory usage and support for pytorch1.7+. Please look forward to it soon afterwards.

@cwq159
Hello, sorry to trouble you again!
What type of GPU do you use.
My GUP memory is small and I want to replace it with a new card.
Thanks again.

RTX8000 with 48G memory

thanks! That is great! how long it will take about the new code?

Now that error doesn't exist. However, as timeSteps get larger, a memory error occurs. GPU memory is only 6GB, batch_size is 1, timesteps=32, Still display GPU memory error.

@buaa-luzhi Execuse me, how do you solve this error: ValueError: ('cannot find tensor Size', torch.Size([16, 16, 640, 640]))