jmpap/YOLOV2-Tensorflow-2.0

Error: StopIteration

CotarP opened this issue · 17 comments

Hi.

I'm wondering if you could help you with an issue. When running window 21 I get the following error:
err

Thank you

jmpap commented

Hi, the generator (train_gen) is empty. In your case, problem seems to be train_gen doesn't generate data.

I fixed the previous issue. And now everything works until the training. I get all in between results, so I assume everything is working. But when I start to train my model I get this
Capture

Do you have any idea what could be the reason

jmpap commented

You can check if the data generator works properly (cell 22).

If the data generator works :

  • perhaps your learning rate is too high?
  • check pixels values are in the interval 0., 1.

Thanks for the suggestions. With my input the data generator works properly.

Than I tried with your images and annotation and I didn't change any parameters. I still get the same result while training.

I don't know what could cause the notebook not to work with the same settings.
I also checked your images, and the pixel value is not between 1 and 0. Should I transform images before training?

And I notice that in the jupyter notebook it says tensorflow 2.1.0 and in your read me you said 2.0. I am now using 2.1, but I don't think that would be the problem.

jmpap commented

Are you using tensorflow 2 notebook version (Yolo_V2_tf_2.ipynb) or tf 1 version (Yolo_V2_tf_eager.ipynb)?

Pixel values in the dataset are in the range [0, 255]. Cell 13 convert values in the range [0., 1.) : tf.image.convert_image_dtype(x_img, tf.float32)

I don’t think that tensorflow version is causing the problem during training.

I will try to test the notebook on my side.

I'm using tensorflow 2 notebook version (Yolo_V2_tf_2.ipynb).

jmpap commented

I cloned the repository on my computer and launched Yolo_V2_tf_2.ipynb. The training is working well. I am using tensorflow 2.4.1.

Please check the pixel values ​​just before entering the model and try different learning rates.

I checked for pixel value in few places and found, that in most cases values goes from 0 to 1, including 1.
And I am still using your data.

jmpap commented

Ok.

  • Is the notebook works well when you just clone the repository and run the notebook?
  • Can you check a lower learning rate : try learning_rate = 1e-6 (cell 27 : optimizer = tf.keras.optimizers.Adam(learning_rate=1e-6, beta_1=0.9, beta_2=0.999, epsilon=1e-08))

I saved the code from github and opened it in Jupyter. The only thing I changed was image/annotation directories.
I tried learning_rate = 1e-6 an also 10e-7 and its still the same.

jmpap commented

Does the notebook work correctly when you do not change the image/annotation directories?

It's the same.

jmpap commented

I have no more ideas to solve this problem.

I think I found the problem. There must be a problem with GPU 'connection', since it worked once I changed the code to use only CPU. I didn't think of that before, since it looks like it can aces the GPU (in 2 cell).

If you still have any ideas, i would be happy to try.
But either way, thank you for all your time and help.

I also noticed a huge difference in performance of cell 22. When using GPU this cell needs a few minutes to compile and while using CPU it's done immediately.

jmpap commented

I'm glad you're close to a solution. On my computer, cell 22 runs very quickly with GPU. Maybe you can check your graphic card driver installation.