AITTSMD/MTCNN-Tensorflow

Process KILLED issue in gen_hard_example.py

Zepyhrus opened this issue · 4 comments

"""
time cost in average1.144 pnet 1.144 rnet 0.000 onet 0.000
boxes length: 12880
finish detecting
save_path is :
../images/no_LM12/RNet
Killed
"""

Hi mate, thanks for your open-source, brilliant work. I got this issue when I was generating training examples for RNet.

I'm using:

  • tensorflow-gpu==1.12.0;
  • GPU: Nvidia Quodra P2000, 5G;
  • Python == 3.6.8;
  • Ubuntu 18.04;

Any idea of this? May this be related to OOM or how may I solve it?

This is so frustrating, each time the test part takes around 3 hours and got killed during the pickle.dump steps, is it because of the huge size of detections, which is generated from mtcnn_detector.detect_face(test_data)?

Problem solved. This is what the author doing: Load images from disk explicitly -> generate the boxes (At this time, all the boxes are cached in memory) -> dump boxes to hard disk for the convenience of debugging -> restore the dumped pickle file for future hard example images saving.

It is so hard to believe that someone would save a intermediate variable to disk just for debugging...

@Zepyhrus Hey I'm sorry to borther you , but I'm facing the same problem , and I don't know how to fix it , can you please tell me what I should do to solve it ? Thank you so much

@yaoyao14 Hi, here is the thing, hope this still helps:

  • you actually don't need to fix it, this is not the right way to write a pipeline as I mentioned above;
  • If you still want to use the author's pipeline without bugs, you can concatenate box-generating and examples saving together, getting rid of the dump process (or just use some other format, like .csv, instead of .pickle), this is what i did.