kwotsin/transfer_learning_tutorial

Not found: Key InceptionResnetV2/Repeat_1/block17_20/Conv2d_1x1/weights/Adam_1

lifematrix opened this issue ยท 12 comments

Hey, thanks for your nice work.

Your code is very clear, and I think there is no problem to run it.
However, when restore weights from checkpoint file, I encountered:

2017-05-12 21:15:45.857973: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key InceptionResnetV2/Repeat_1/block17_20/Conv2d_1x1/weights/Adam_1 not found in checkpoint
         [[Node: save_1/RestoreV2_1287 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save_1/Const_0, save_1/RestoreV2_1287/tensor_names, save_1/RestoreV2_1287/shape_and_slices)]]
2017-05-12 21:15:45.858128: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key InceptionResnetV2/Repeat/block35_10/Branch_2/Conv2d_0a_1x1/weights/Adam_1 not found in checkpoint
2017-05-12 21:15:45.858508: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key InceptionResnetV2/Repeat_1/block17_20/Conv2d_1x1/weights/Adam_1 not found in checkpoint
         [[Node: save_1/RestoreV2_1287 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save_1/Const_0, save_1/RestoreV2_1287/tensor_names, save_1/RestoreV2_1287/shape_and_slices)]]
2017-05-12 21:15:45.858674: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key InceptionResnetV2/Repeat_1/block17_18/Branch_1/Conv2d_0a_1x1/BatchNorm/beta/Adam_1 not found in checkpoint
2017-05-12 21:15:45.858837: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key InceptionResnetV2/Repeat/block35_10/Branch_2/Conv2d_0b_3x3/BatchNorm/beta/Adam not found in checkpoint
2017-05-12 21:15:45.858861: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key InceptionResnetV2/Repeat/block35_10/Branch_2/Conv2d_0b_3x3/BatchNorm/beta/Adam_1 not found in checkpoint
2017-05-12 21:15:45.861648: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key InceptionResnetV2/Repeat_1/block17_20/Conv2d_1x1/weights/Adam_1 not found in checkpoint
         [[Node: save_1/RestoreV2_1287 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save_1/Const_0, save_1/RestoreV2_1287/tensor_names, save_1/RestoreV2_1287/shape_and_slices)]]

I also use the tools to display all weights in checkpoint file and do not find layers such as XXX/weights/Adam_1, neither find it in network definition file: inception_resnet_v2.py. So, I guess
the weights XXX/weights/Adam_1 maybe caused by learning algorithm in train_flowers.py:

#Now we can define the optimizer that takes on the learning rate
optimizer = tf.train.AdamOptimizer(learning_rate = lr)

Thanks for help!

Steven

Hi Steven, thank you for your kind words.

The checkpoint model does not contain any adam variables, and the issue should be due to an attempt to restore the checkpoint model from your log directory, instead of the original TF-slim checkpoint model. This happens when the supervisor knows you have given it a log directory, and there's already a checkpoint model in the log directory from your previous training, so it tries to restore a checkpoint from there instead. But if you have changed your code after your previous training and intend for a fresh start, then you will face this problem since the checkpoint model from the log directory has all these adam variables from your previous training, which your fresh run of the code doesn't have.

Solution: You should try to delete your old log directory files if you changed the code and run the files again.

Hey, kwostin,

Thanks for your quick response. It really works. I delete the checkpoint file in log directory, and now everything is OK.

Thanks again,

Steven

No worries. Glad it has helped you! :D

somehow i get this host of issues while trying to run eval code

2017-05-22 14:04:56.938959: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/moving_mean not found in checkpoint

Hi @Benuraj , the problem most likely comes from using the wrong saver in the code. I have updated the code and you should refer here for the solution: https://github.com/kwotsin/transfer_learning_tutorial/blob/master/README.md

Please update your code to the latest version in order to prevent this error.

Thanks a lot now I'm able to run the eval code. I'm new to Tensorflow but have prior experience in caffe and torch7 and I'm unable to restore/freeze the model to be used in python or c++ api in order to run the prediction of trained model. Will it be possible for you to guide me.

@Benuraj would you be able to check if this file here will fit your use? https://github.com/kwotsin/Tensorflow-Xception/blob/master/write_pb.py

This file is what I've been using to freeze my models. Simply change the output node names and log directory as appropriately to get your .pb file. Once you have the pb file, running inference will be possible.

Thanks for the cordial response. I'm looking into it and I've a query

The InceptionResnetV2 model do not have Softmax layer?

Every graph has a different name to each of the nodes, depending on how the original author wrote it. You can find the name for the softmax layer using:

for op in graph.get_operations():
    print op.values()

From my experience, the softmax layer for Inception-resnet-v2 should be InceptionResnetV2/Logits/Predictions.

Guessed so but I got confused because of Xception model where the Softmax was named as is and here for InceptionResnetV2 it was different.
Thanks for the clarification.

In order to evaluate the freezed model we are to use it in this fashion?

with tf.Session() as sess:
softmax_tensor = sess.graph.get_tensor_by_name('InceptionResnetV2/Logits/Predictions:0')
for i in range(len(testimages)):
image_data = tf.gfile.FastGFile(testimages[i], 'rb').read()
predictions = sess.run(softmax_tensor,{image_data})

Generally I would use get_tensor_by_name within the graph before running it in a session. Your approach might work, but I think it would be cleaner to just define it in the graph first.