Ze-Yang/Context-Transformer

cannot import name '_mask' from 'utils.pycocotools'

Closed this issue · 9 comments

Your code used to run properly, but after the GPU driver on our lab's server was reinstalled, I met this error "ImportError: cannot import name '_mask' from 'utils.pycocotools' ". I tried to reupload the 'pycocotools' folder, but it didn't work.

It's probably that your new cuda version does not match with the one you compile the program with.

  1. Ensure that the cuda environment is integrally installed, including compiler, tools and libraries. Plus, make sure the cudatoolkit version in the conda environment matches with the one you compile with. Check about that using nvcc -V and conda list | grep cudatoolkit, the output version should be the same.
  2. After that, you should delete the compile output files, e.g., *.o, inside the utils/nms/ and utils/pycocotools/ folders. Then, recompile with the command sh make.sh.

Hope that it helps. Thanks.

Yes, it's the compiling problem. Thanks.

Could you share the hyperparameter settings in pre-training phase? I pretrained the model on COCO for 80k iterations, and the total loss stayed at 6.3 for the last 20k iters. However, the AP of my pretrained model on COCO nonvoc_minival is 20, not as high as the AP of the model you offered at baidu pan, which is around 27.

The number of iterations you run is not the same as I specify. Note that all the training details have been set for you. Feel free to run with the command line python train.py --save-folder weights/COCO60_pretrain -d COCO -p 1 to obtain the pretraining model on COCO60, the AP should be about 27~28%. Thanks.

Thanks for your reply. I did use the command you offered in Readme, only that num_gpu is tuned to 1. Till now the training has run for 100k iters, but the total loss and AP is almost the same as those of 60k iters (loss: 6.3, AP: 20.5), TOT. I assume it's not normal?

I haven't yet tried on 1 GPU. It's highly recommended to run on multiple GPUs for a larger batch size. FYI, I think you can run with extra aument --log to record the training details to figure out where the problem resides. Alternatively, you can directly download the model I have trained for you. Hope that it helps.

I‘ll try multiple GPUs, but since the batch size is remained as default settings (64), I don't think it could make a huge difference. As for your pre-trained model, I can't directly use it since I modified the backbone (current training is ran with original vgg16 net for further comparision). By the way, is there any chance you happen to keep the original pre-training log file or tensorboard event files? If you do, could you share it with me? It will help a lot. Thanks.

FYI, here is the training log.txt. As for the tensorboard file, I can send to you via email since the github does not support that file type. Kindly let me know if you need.

Thanks for your sharing, and tensorboard file is not necessary now. I checked your log file and it turned out to be very similar to mine. I'm just too impatient.