nqanh/affordance-net

demo.py / Check failed: error == cudaSuccess (2 vs. 0) out of memory *** Check failure stack trace: *** Aborted (core dumped)

Opened this issue · 10 comments

I was the one who asked the other question.
Now I am in another difficulty.
start 'demo.py' in your code 'affordance net'

but,There is an error like the picture.
default

Do you have this error? If so, how did you resolve it?

thank you, 고마워요~

nvidia-smi
2

cf)
i started python demo.py --gpu 1

nqanh commented

Try CUDA_VISIBLE_DEVICES=1 python dempy.py. The --gpu option from Caffe not always work.

I tried. But it does not. maybe need i to modify testprototxt?

i solved modify config.py

@nqanh
I've made a lot of attempts with the f-measure evaluation code you helped me with.
However, the experimental results are different from the paper. (too much)
Excuse me, but can you get the code?

nqanh commented

Here is the matlab code we use, make sure you change the params based on your dataset, and be careful with the Matlab index (starts from 1):

function F_wb_non_rank = evaluate_Fwb_non_rank(path_predited, path_gt)

% affordances index
aff_start=2;   % ignore {background} label
aff_end=10;   % change based on the dataset 

% get all files
list_predicted = getAllFiles(path_predited);   % get all files in current folder
list_gt = getAllFiles(path_gt);
list_predicted = sort(list_predicted);
list_gt = sort(list_gt); % make the same style
assert(length(list_predicted)==length(list_gt)); % test length
num_of_files = length(list_gt);

F_wb_aff = nan(num_of_files,1);
F_wb_non_rank = [];

for aff_id = aff_start:aff_end  % from 2 --> final_aff_id
    for i=1:num_of_files
        
        fprintf('------------------------------------------------\n');
        fprintf('affordance id=%d, image i=%d \n', aff_id, i);
        fprintf('current pred: %s\n', list_predicted{i});
        fprintf('current grth: %s\n', list_gt{i});
        
        %%read image      
        pred_im = imread(list_predicted{i}); 
        gt_im = imread(list_gt{i});

        fprintf('size pred_im: %d \n', size(pred_im));
        fprintf('size gt_im  : %d \n', size(gt_im));
        
        pred_im = pred_im(:,:,1);
        gt_im = gt_im(:,:,1);
       
        targetID = aff_id - 1; %labels are zero-indexed so we minus 1
        
        % only get current affordance
        pred_aff = pred_im == targetID;
        gt_aff = gt_im == targetID;
        
        if sum(gt_aff(:)) > 0 % only compute if the affordance has ground truth
            F_wb_aff(i,1) = WFb(double(pred_aff), gt_aff);  % call WFb function
        else
            %fprintf('no ground truth at i=%d \n', i);
        end
        
    end
    fprintf('Averaged F_wb for affordance id=%d is: %f \n', aff_id-1, nanmean(F_wb_aff));
    F_wb_non_rank = [F_wb_non_rank; nanmean(F_wb_aff)];
    
end


end

@nqanh I suspect there is something wrong with the test code. I use the data you provided for training. When I test different iterations of the model, some models report “syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory”.
When I use the cpu test the failed model, the same image, the same model sometimes reported different results, apparently due to the problem of memory cross-border.
.

nqanh commented

You should test after training for at least 50K iterations, and it's also strongly recommended to a GPU with enough memory.

@nqanh @ambl2357 Do you solved this problem? I test the model of 60K iterations, it still report "error == cudaSuccess (2 vs. 0) out of memory", I wonder why ? I don't think it make sense because of different iterations.

@superchenyan i solved modify config.py , __C.TRAIN.BATCH_SIZE = 32 to , __C.TRAIN.BATCH_SIZE = 16
when i was train.
and when i test, __C.TEST.MAX_SIZE = 1000 to 500

@nqanh
I'v fine-tuned imagenet pre-trained model with my custom dataset, and I changed the name of bbox_pred layer with 'bbox_pred_face'
The problem comes from the snapshot wrapper in train.py. This wrapper only works if your bbox_pred layer is named 'bbox_pred'. so the training prcess is wrong. I wish others donot make the same mistake as I did.

@ambl2357 I meet the same problem, modify config.py, it worked?