Yang7879/AttSets

After training testing code gives error like "r2n/Reshape_9:0 is missing

Ajithbalakrishnan opened this issue · 8 comments

Hi, I have tried retraining of pre-trained network and training from scratch using the training code that you have given. After 400 epoches i have saved the model of both. But i got same error in both model while testing. error was in below line

Y_pred = tf.get_default_graph().get_tensor_by_name("r2n/Reshape_9:0")

r2n/Reshape_9 tensor was not there. I have confirmed it by TensorBoard. But while checking the graph generated by pre-trained network which is downloaded shows the same tensor. How it is possible??

One interesting thing is that , I changed the code line as below

Y_pred = tf.get_default_graph().get_tensor_by_name("r2n/Reshape_7:0")

And then tested with my trained network. It give output. Because after training myself r2n block takes 323232 sized voxels from r2n/Reshape_7:0.

Can you explain why it is so???

@Ajithbalakrishnan sorry for the delay. Have you solved it? Which version tf do u use? seems the default name for the tensor is changed.

Thanks for the reply sir. yes i solved it. After training from scratch i took voxel output ( Y_pred ) from "r2n/Reshape_7:0" tensor.

One more doubt is that in this paper you said the loss function is IOU. But in the implementation code it seems like you have used cross entropy loss which is described in the base paper 3D-R2N2. why it is so?

I got similar loss functions listed below
1.IOU
2.Chamfer distance
3.Cross entropy
4. Earth movers distance ( https://arxiv.org/abs/1612.00603 )
how can i select a good loss function from this?

@Ajithbalakrishnan As to voxel prediction, the cross-entropy loss is usually used for this binary classification problem.

Thanks , I got it.
Can you please share the code for finding the IOU ?
I have tried with the below code for IOU . But it got an error
InvalidArgumentError (see above for traceback): assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_2:0) = ] [1]

Code -
import tensorflow as tf
import os
import sys
sys.path.append('..')
import tools as tools
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
GPU='0'

vox_res = 32

def load_real_rgbs(test_mv=3):
obj_rgbs_folder ='./Data_sample/amazon_real_rgbs/lamp/'
rgbs = []
rgbs_views = sorted(os.listdir(obj_rgbs_folder))
for v in rgbs_views:
if not v.endswith('png'): continue
rgbs.append(tools.Data.load_single_X_rgb_r2n2(obj_rgbs_folder + v, train=False))
rgbs = np.asarray(rgbs)
x_sample = rgbs[0:test_mv, :, :, :].reshape(1, test_mv, 127, 127, 3)
return x_sample, None

def load_shapenet_rgbs(test_mv=3):
obj_rgbs_folder = './Data_sample/ShapeNetRendering/03001627/1a6f615e8b1b5ae4dbbc9440457e303e/rendering/'
obj_gt_vox_path ='./Data_sample/ShapeNetVox32/03001627/1a6f615e8b1b5ae4dbbc9440457e303e/model.binvox'
rgbs=[]
rgbs_views = sorted(os.listdir(obj_rgbs_folder))
for v in rgbs_views:
if not v.endswith('png'): continue
rgbs.append(tools.Data.load_single_X_rgb_r2n2(obj_rgbs_folder + v, train=False))
rgbs = np.asarray(rgbs)
x_sample = rgbs[0:test_mv, :, :, :].reshape(1, test_mv, 127, 127, 3)
y_true = tools.Data.load_single_Y_vox(obj_gt_vox_path)
#########################################
Y_true_vox = []
Y_true_vox.append(y_true)
Y_true_vox = np.asarray(Y_true_vox)
return x_sample, Y_true_vox
#########################################
def ttest_demo():
model_path = './Model_released/'
if not os.path.isfile(model_path + 'model.cptk.data-00000-of-00001'):
print ('please download our released model first!')
return

config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.visible_device_list = GPU
with tf.Session(config=config) as sess:
    saver = tf.train.import_meta_graph(model_path + 'model.cptk.meta', clear_devices=True)
    saver.restore(sess, model_path + 'model.cptk')
    print ('model restored!')

    X = tf.get_default_graph().get_tensor_by_name("Placeholder:0")
    Y_pred = tf.get_default_graph().get_tensor_by_name("r2n/Reshape_9:0")


    x_sample, gt_vox = load_shapenet_rgbs()
    
### IOU #########################################################
    gt_vox=gt_vox.astype(np.float64)

    Y_vox_ = tf.reshape(gt_vox, shape=[-1, vox_res ** 3,1])
    Y_pred_ = tf.reshape(Y_pred, shape=[-1, vox_res ** 3,1])
    iou = tf.metrics.mean_iou(labels=Y_vox_,predictions=Y_pred_,num_classes=1)
    sess.run(tf.local_variables_initializer())

     #########################################################
     ## session run
    y_pred,recon_loss,iou_value = sess.run([Y_pred, rec_loss,iou], feed_dict={X: x_sample})			                     
    print("IOU :",iou_value)		                                  
     ###### to visualize
th = 0.25
y_pred[y_pred>=th]=1
y_pred[y_pred<th]=0
tools.Data.plotFromVoxels(np.reshape(y_pred,[32,32,32]), title='y_pred')
if gt_vox is not None:
    tools.Data.plotFromVoxels(np.reshape(gt_vox,[32,32,32]), title='y_true')
from matplotlib.pyplot import show
show()
    #########################

if name == 'main':
ttest_demo()

@Ajithbalakrishnan Here's the script for IoU calculation.

def metric_IoU(batch_voxel_occup_pred, batch_voxel_occup_true):
    batch_voxel_occup_pred_ = copy.deepcopy(batch_voxel_occup_pred)
    batch_voxel_occup_pred_[batch_voxel_occup_pred_ >= 0.5] = 1
    batch_voxel_occup_pred_[batch_voxel_occup_pred_ < 0.5] = 0
    I = batch_voxel_occup_pred_ * batch_voxel_occup_true
    U = batch_voxel_occup_pred_ + batch_voxel_occup_true
    U[U < 1] = 0
    U[U >= 1] = 1
    iou = np.sum(I) * 1.0 / np.sum(U) * 1.0
    return iou

Thank you . It Works.
Instead of cross-entropy have you ever tried with any other loss functions like,
1.Earth movers distance ( https://arxiv.org/abs/1612.00603)
2.Mean squared false cross entropy loss ( MSFCEL) (https://arxiv.org/abs/1804.06375)
or any other......
Can you please share your opinion?

@Ajithbalakrishnan sorry, I didn't try it.

Ok Thanks.
Till now i only tried to train from scratch. But,
While retrain the released model (uncommented the line in main_attsets.py)
i got an error... Its given below.

total weights: 52590114
2019-06-12 04:38:53.597409: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX FMA
2019-06-12 04:38:53.597951: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
restoring saved model!
2019-06-12 04:38:57.550933: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key beta1_power_2 not found in checkpoint
Traceback (most recent call last):
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
return fn(*args)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key beta1_power_2 not found in checkpoint
[[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1538, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 887, in run
run_metadata_ptr)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1110, in _run
feed_dict_tensor, options, run_metadata)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1286, in _do_run
run_metadata)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1308, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key beta1_power_2 not found in checkpoint
[[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
File "main_AttSets.py", line 475, in
net.build_graph()
File "main_AttSets.py", line 378, in build_graph
self.saver = tf.train.Saver(max_to_keep=1)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1094, in init
self.build()
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1106, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1143, in _build
build_save=build_save, build_restore=build_restore)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 787, in _build_internal
restore_sequentially, reshape)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
restore_sequentially)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 854, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op
op_def=op_def)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1768, in init
self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Key beta1_power_2 not found in checkpoint
[[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1548, in restore
names_to_keys = object_graph_key_mapping(save_path)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1822, in object_graph_key_mapping
checkpointable.OBJECT_GRAPH_PROTO_KEY)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 359, in get_tensor
status)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 526, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main_AttSets.py", line 475, in
net.build_graph()
File "main_AttSets.py", line 392, in build_graph
self.saver.restore(self.sess, path + 'model.cptk')
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1554, in restore
err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key beta1_power_2 not found in checkpoint
[[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
File "main_AttSets.py", line 475, in
net.build_graph()
File "main_AttSets.py", line 378, in build_graph
self.saver = tf.train.Saver(max_to_keep=1)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1094, in init
self.build()
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1106, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1143, in _build
build_save=build_save, build_restore=build_restore)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 787, in _build_internal
restore_sequentially, reshape)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
restore_sequentially)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 854, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op
op_def=op_def)
File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1768, in init
self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key beta1_power_2 not found in checkpoint
[[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]