Key not found in checkpoint when restoring saved model

Question

Key not found in checkpoint when restoring saved model

Closed this issue 5 years ago · 4 comments

Hello,

I'm trying to test deephyp using the provided examples scripts, and experiencing an error when using the trained model for testing @ line 45 of autoencoder_test_MLP_basic.py:
dataZ = net.encoder( modelName='csa_100', dataSamples=hypData.spectraPrep )

The file models\test_ae_mlp\epoch_100\model.ckpt exists but the following error is raised :

NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key Variable_72 not found in checkpoint
[[node save_5/RestoreV2 (defined at ..\deephyp\network_ops.py:300) ]]

My config :

Win10 with Anaconda
Python 3.5
TensorFlow 1.14.0
Numpy 1.17.2

Any ideas ?
Thanks,
Alex

Answer 1 · 2019-10-03T06:17:18.000Z

Hi Alex,

Seems that the architecture of the model that was trained is not matching the architecture of the model setup in the testing script. Just to clarify, are you importing deephyp from pypi or just working off the github repo? And are the train and test scripts you ran modified from the examples in any way?

I just cloned the github repo then and ran autoencoder_train_MLP_basic.py and autoencoder_test_MLP_basic.py as they were using python 3.5 and it didn't produce any error, so maybe try doing that first so we can rule out any system issues (if you haven't already done so).

Does the config file you get look like this?:
{"weightInitOpt": "truncated_normal", "encodersize": [50, 30, 10], "weightStd": 0.1, "activationFunc": "relu", "activationFuncFinal": "linear", "skipConnect": false, "inputSize": 103, "tiedWeights": [0, 0, 0]}

If the config file is correct but for some reason is not being read properly, you can specify the architecture without using a config file. Try and swap this line in autoencoder_test_mlp_basic.py:

net = autoencoder.mlp_1D_network( configFile=os.path.join('models','test_ae_mlp','config.json') )

with this line:

net = autoencoder.mlp_1D_network( inputSize=hypData.numBands, encoderSize=[50,30,10], activationFunc='relu',weightInitOpt='truncated_normal', tiedWeights=None, skipConnect=False )

Let me know it goes,
Lloyd

Answer 2 · 2019-10-06T13:59:16.000Z

Hi I tried this command at the end of the train script and it work for both CNN and MLP

from tensorflow.python.framework import ops
ops.reset_default_graph()

Let me know if it works for you?

Answer 3 · 2019-10-07T11:31:41.000Z

Thanks a lot for your help @lloydwindrim and @puneetmishra2
Reseting to default graph using ops.reset_default_graph() solved the issue !

Answer 4 · 2019-10-15T02:29:31.000Z

Fantastic!