huanghoujing/AlignedReID-Re-Production-Pytorch

Re-Ranking MemoryError

Closed this issue · 5 comments

请问下这个问题该怎么解决呢?
单显卡GTX1060 6G,Ubuntu18.04 64位,内存8G
python script/experiment/train.py
-d '(0,)'
-r 1
--dataset market1501
--ids_per_batch 16
--ims_per_id 4
--normalize_feature false
-gm 0.3
-glw 1
-llw 0
-idlw 0
--base_lr 2e-4
--lr_decay_type exp
--exp_decay_at_epoch 151
--total_epochs 300

Computing scores for Global Distance...
[mAP: 70.85%], [cmc1: 85.75%], [cmc5: 94.42%], [cmc10: 96.26%]
Done, 11.22s
Re-ranking...
Traceback (most recent call last):
File "script/experiment/train.py", line 632, in
main()
File "script/experiment/train.py", line 628, in main
test(load_model_weight=False)
File "script/experiment/train.py", line 386, in test
use_local_distance=use_local_distance)
File "./aligned_reid/dataset/TestSet.py", line 240, in eval
global_q_g_dist, global_q_q_dist, global_g_g_dist)
File "./aligned_reid/utils/re_ranking.py", line 47, in re_ranking
initial_rank = np.argsort(original_dist).astype(np.int32)
MemoryError

2018-11-09 16-01-59

你好,re-ranking是在CPU上完成的,这里的MemoryError我看不出来是什么原因,即使是内存不够的话,跟显存是没关系的。

是我的内存不够,实验室才入手了一台机器,GTX 1080Ti 11G显存,16G内存,这个是跑出来的结果:
v7 8ag kt_1 woj o7 k0y

Hi @PayneYong
did you solve this issue? how?
Thanks for the help

Hi, it is owing to the memory of your machine. I had ran the project on another machine, and it succeded. The machine before has 8G memory, and the new machine now has 16G memory. I had also ran the mutual learning example, and it succeded too. The memory refered is the CPU memroy, not the GPU memory,because the re-ranking runs on the CPU. Although I had solve the problem in the way of changing machine, I think if you can solve it by changing some code about re-ranking or change the virtual memory of your machine. Forgive my poor English! Hope this helps!

Hi @PayneYong
Thanks for the quick respond
yes you are right this is RAM issue
my dataset is too big for my machine
I'll try to see if there are any others implementation that work for large dataset
Thanks