weiliu89/caffe

training error: Data layer prefetch queue empty

Jimjipeng opened this issue · 14 comments

I0316 09:00:59.149740 2416 blocking_queue.cpp:50] Data layer prefetch queue empty

It is not a problem, just a warning.
It means data loading is slower than net forward, so net has to wait for data IO to prepare next batch.

Data layer prefetch queue empty
After that,program stands still forever.....@ujsyehao

I met the same problem, have you solved it?

gpu power lower , using another device

I've come across the same problem.
I'm using a Titan V and CUDA 9.0.

I met the same problem today. I'm using 1080 Ti and CUDA 9.2 on windows 10.
Who could tell me the solution? thx.

@Jimjipeng I don't think its GPU power problem . I have switch between 2 devices but all the same behaviour which indicates this and still forever, and CPU usage 100%.

However, my another yolov3 code which also using caffe does not have this problem

Pinnh commented

I meet the same problem using 1080Ti & CUDA 10.0, but there is no problem when using gtx1060 & CUDA 8.0 I don't think its GPU power problem too, The problem I found may be the annotated_data_layer

Pinnh commented

I get Resolved,
in src/caffe/util/sampler.cpp

caffe_rng_uniform(1, 0.f, 1 - bbox_width, &w_off);
caffe_rng_uniform(1, 0.f, 1 - bbox_height, &h_off);

caffe_rng_uniform will get block, when bbox_width or bbox_height near 1.0 , (1 - bbox_width) will less than 0.f

I change the SampleBBox function, get success

void SampleBBox(const Sampler& sampler, NormalizedBBox* sampled_bbox) {
// Get random scale.
CHECK_GE(sampler.max_scale(), sampler.min_scale());
CHECK_GT(sampler.min_scale(), 0.);
CHECK_LE(sampler.max_scale(), 1.);
float scale;
caffe_rng_uniform(1, sampler.min_scale(), sampler.max_scale(), &scale);
// Get random aspect ratio.
CHECK_GE(sampler.max_aspect_ratio(), sampler.min_aspect_ratio());
CHECK_GT(sampler.min_aspect_ratio(), 0.);
CHECK_LT(sampler.max_aspect_ratio(), FLT_MAX);
float aspect_ratio;
caffe_rng_uniform(1, sampler.min_aspect_ratio(), sampler.max_aspect_ratio(),
&aspect_ratio);
aspect_ratio = std::max(aspect_ratio, std::pow(scale, 2.));
aspect_ratio = std::min(aspect_ratio, 1 / std::pow(scale, 2.));
// Figure out bbox dimension.
float bbox_width = scale * sqrt(aspect_ratio);
float bbox_height = scale / sqrt(aspect_ratio);
if(bbox_width>=1.0){
bbox_width=1.0;
}
if(bbox_height>=1.0){
bbox_height=1.0;
}

// Figure out top left coordinates.
float w_off, h_off;
caffe_rng_uniform(1, 0.f, 1.0f - bbox_width, &w_off);
caffe_rng_uniform(1, 0.f, 1.0f - bbox_height, &h_off);
sampled_bbox->set_xmin(w_off);
sampled_bbox->set_ymin(h_off);
sampled_bbox->set_xmax(w_off + bbox_width);
sampled_bbox->set_ymax(h_off + bbox_height);
}

after tracing this error i finally narrowed it down, this one caused by zero dimension image (either zero width, zero height or both, some of them caused from early casting from float to int), there are 3 methods which trigger this on caffe, DataTransformer::CropImage, DataTransformer::ExpandImage and SampleBBox on sampler.cpp, after i fixed them now the training process works fine on Nvidia TX2 hardware

@Pinnh fix my problem, thanks

@Pinnh fix my problem too, thanks.

I got this problem too, and in my situation, it is because I'm trying to run 2 different training with caffe, the first one runs normally, and the second one will stuck at Data layer prefetch queue empty. So I solved this problem by using another compiled caffe to run the second training, and it works.
It may be a special case, and I haven't figure out the reason, just in case that somebody meet the same problem with me.

I get Resolved,
in src/caffe/util/sampler.cpp

caffe_rng_uniform(1, 0.f, 1 - bbox_width, &w_off); caffe_rng_uniform(1, 0.f, 1 - bbox_height, &h_off);

caffe_rng_uniform will get block, when bbox_width or bbox_height near 1.0 , (1 - bbox_width) will less than 0.f

I change the SampleBBox function, get success

void SampleBBox(const Sampler& sampler, NormalizedBBox* sampled_bbox) {
// Get random scale.
CHECK_GE(sampler.max_scale(), sampler.min_scale());
CHECK_GT(sampler.min_scale(), 0.);
CHECK_LE(sampler.max_scale(), 1.);
float scale;
caffe_rng_uniform(1, sampler.min_scale(), sampler.max_scale(), &scale);
// Get random aspect ratio.
CHECK_GE(sampler.max_aspect_ratio(), sampler.min_aspect_ratio());
CHECK_GT(sampler.min_aspect_ratio(), 0.);
CHECK_LT(sampler.max_aspect_ratio(), FLT_MAX);
float aspect_ratio;
caffe_rng_uniform(1, sampler.min_aspect_ratio(), sampler.max_aspect_ratio(),
&aspect_ratio);
aspect_ratio = std::max(aspect_ratio, std::pow(scale, 2.));
aspect_ratio = std::min(aspect_ratio, 1 / std::pow(scale, 2.));
// Figure out bbox dimension.
float bbox_width = scale * sqrt(aspect_ratio);
float bbox_height = scale / sqrt(aspect_ratio);
if(bbox_width>=1.0){ bbox_width=1.0; } if(bbox_height>=1.0){ bbox_height=1.0; }
// Figure out top left coordinates.
float w_off, h_off;
caffe_rng_uniform(1, 0.f, 1.0f - bbox_width, &w_off);
caffe_rng_uniform(1, 0.f, 1.0f - bbox_height, &h_off);
sampled_bbox->set_xmin(w_off);
sampled_bbox->set_ymin(h_off);
sampled_bbox->set_xmax(w_off + bbox_width);
sampled_bbox->set_ymax(h_off + bbox_height);
}

my issue is not fixed....