tensorflow/tpu

Pretrained EfficientNet on GPU throws an error: Key efficientnet-b5/blocks_0/conv2d/kernel/RMSProp not found in checkpoint

mijung-kim opened this issue · 4 comments

Ubuntu 16.04 LTS
TF 1.15
Python 3.7
Using docker

command to reproduce (however, I used my own data):
$ CUDA_VISIBLE_DEVICES=0 python main.py --data_dir $MY_CUSTOM_DATA --num_label_classes=2 --model_dir=efficientnet-b5 --model_name=efficientnet-b5

I have tried to use pre-trained efficientnet-b1, b4, and b5, which gave me the same error as follows. Please let me know if you have found any solutions on this matter.

tensorflow.python.framework.errors_impl.NotFoundError: Key efficientnet-b5/blocks_0/conv2d/kernel/RMSProp not found in checkpoint
[[{{node save/RestoreV2}}]]

@mijung-kim
I have encountered the same error info:
"NotFoundError: Key efficientnet-lite0/blocks_0/conv2d/kernel/RMSProp not found in checkpoint" and
"NotFoundError: Key efficientnet-b0/blocks_0/conv2d/kernel/RMSProp not found in checkpoint"
with efficientnet and efficientnet-lite respectively.
Do you know how to do?

Same Error here!

It looks like that the released ckpt was trained using 'sgd' optimier. I have fixed this error by changing the optimizer_name to 'sgd' in main.py when restoring params from the released ckpt.

optimizer = utils.build_optimizer(learning_rate,'sgd')

Thanks guys! that got me a bit further, but following that, I stumble into another issue

WARNING:tensorflow:Reraising captured error
W0112 22:04:47.559992 140622358390592 error_handling.py:149] Reraising captured error
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
  (0) Not found: Key global_step not found in checkpoint
         [[{{node save/RestoreV2}}]]
  (1) Not found: Key global_step not found in checkpoint
         [[{{node save/RestoreV2}}]]
         [[save/RestoreV2/_403]]
0 successful operations.
0 derived errors ignored.

Any idea how to solve it? Iam using tf 2.3.
Does this work only for tf 1.15?