gbrlfaria/rune-breaker

Arrow Samples

Closed this issue · 4 comments

Hi Gabriel! I was able to generate the arrow samples successfully, but am now stuck on the training step. I'm getting the following error:

Classification model training application started.

Settings
value
max_epochs 240
patience 80
batch_size 128

Creating model...
2020-07-10 19:48:06.871895: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)

Creating generators...
Found 0 images belonging to 0 classes.
Found 0 images belonging to 0 classes.

Fitting model...
WARNING:tensorflow:From model/train.py:121: Model.fit_generator (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
Please use Model.fit, which supports generators.

Any ideas as to why that might be? Any help would be greatly appreciated!

Hello, Dan!

After some research, it seems that the first 'error' is just a fallback message telling you that Tensorflow is using the CPU instead of the GPU due to some failure when loading CUDA. As you can see, the program continues to run normally after the message, so you shouldn't worry about it. If you really want to use GPU processing, you should take a look at this page and verify your Tensorflow installation.

What seems to be causing problems, though, is the fact that the generators can't find any sample images.

Creating generators...
Found 0 images belonging to 0 classes.
Found 0 images belonging to 0 classes.

Your training, validation, and testing folders seem to be empty. Did you run the make_dataset.py script?

Lastly, can you provide the versions of your Tensorflow and Keras installations? This may help us identify any existing incompatibilities if we are using different versions (you can check the expected versions in the requirements file).

Hi, Gabriel - thanks for your reply! You were right, I completely missed the step for making the dataset.

My Tensorflow version: 2.2.0
My Keras version: 2.4.3

I was able to successfully create the dataset, but am now getting stuck at the training step. I'm currently getting this error:

Classification model training application started.

Settings
           value
max_epochs   240
patience      80
batch_size   128

Creating model...
2020-07-11 18:05:16.709735: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)

Creating generators...
Found 216 images belonging to 4 classes.
Found 110 images belonging to 4 classes.

Fitting model...
WARNING:tensorflow:From train.py:121: Model.fit_generator (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
Please use Model.fit, which supports generators.
Epoch 1/240
Traceback (most recent call last):
  File "train.py", line 165, in <module>
    main(args.batch_size, args.model)
  File "train.py", line 37, in main
    fit(model, training, validation, batch_size)
  File "train.py", line 121, in fit
    history = model.fit_generator(
  File "C:\Users\dting\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\util\deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "C:\Users\dting\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\keras\engine\training.py", line 1465, in fit_generator
    return self.fit(
  File "C:\Users\dting\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\keras\engine\training.py", line 66, in _method_wrapper
    return method(self, *args, **kwargs)
  File "C:\Users\dting\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\keras\engine\training.py", line 862, in fit
    val_logs = self.evaluate(
  File "C:\Users\dting\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\keras\engine\training.py", line 66, in _method_wrapper
    return method(self, *args, **kwargs)
  File "C:\Users\dting\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\keras\engine\training.py", line 1091, in evaluate
    logs = tf_utils.to_numpy_or_python_type(logs)
UnboundLocalError: local variable 'logs' referenced before assignment

Any help would be greatly appreciated - thank you so much for your time!

A quick search leads to this Tensorflow issue. It appears that this error happens when the batch size is larger than any of your datasets. In fact, your batch size is 128, while the size of the validation set is 110. Just set the batch size correctly through the command line and you should be fine. You may want to change the number of epochs as well, which you can do directly in the file.

That did the trick! I was able to get everything working from start to finish; thank you so much for your help, and thanks for the great write up!