saver.restore fails

Question

saver.restore fails

Opened this issue 8 years ago · 37 comments

model-20160506.ckpt-500000 and checkpoint from yobiface was put in model_check_point.

NotFoundError: Tensor name 

"incept3b/in4_conv1x1_15/batch_norm/cond/incept3b/in4_conv1x1_15/batch_norm/moments/moment

s_1/variance/ExponentialMovingAverage/biased" not found in checkpoint files 

./model_check_point/model-20160506.ckpt-500000

Answer 1 · 2017-02-12T02:05:22.000Z

I have the same issue, could you please check it? Thanks!

Answer 2 · 2017-02-14T09:35:35.000Z

I have the same issue?

Answer 3 · 2017-02-17T07:31:36.000Z

Because of new version of tensorflow, some upgrade to existing code should be done . However, I haven't found a solution yet.

Answer 4 · 2017-02-21T10:40:54.000Z

I have the same issue,have you figure it out?

Answer 5 · 2017-03-08T08:31:47.000Z

我也碰到这个问题，我使用的是tensorflow0.12
那楼主是在tensorflow的那个版本上跑通的呢？原因应该是楼主提供的model文件与network里的参数不匹配吧？

Answer 6 · 2017-03-09T02:23:14.000Z

I have the same issue.
I've changed mul to multiply, and changed the sequence of concat's parameters.
yet in the last step it seems that the model is not compatible with the new tensorflow version 1.0
could you please tell me what version of tensorflow you used in this project? thanks a lot.

Answer 7 · 2017-03-09T02:39:35.000Z

I used tensorflow version 0.12 zy86603465@163.com From: Rivrr Date: 2017-03-09 10:23 To: shanren7/real_time_face_recognition CC: zhangyu; Comment Subject: Re: [shanren7/real_time_face_recognition] saver.restore fails (#1) I have the same issue. I've changed mul to multiply, and changed the sequence of concat's parameters. yet in the last step it seems that the model is not compatible with the new tensorflow version 1.0 could you please tell me what version of tensorflow you used in this project? thanks a lot. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Answer 8 · 2017-03-09T02:45:58.000Z

@zy86603465 and did it work on 0.12 ?

Answer 9 · 2017-03-09T02:49:14.000Z

@Rivrr it didn't ,with the same problem

Answer 10 · 2017-03-09T02:52:07.000Z

@MiloAnthony fine... guess we have to wait until @shanren7 reply
thanks anyway

Answer 11 · 2017-05-22T02:33:19.000Z

@xiaoxinyi @MiloAnthony
It works well with tensorflow 0.11.0
It doesn't work with tensorflow 0.12.0

Answer 12 · 2017-07-04T03:34:09.000Z

I have the same issue,

Answer 13 · 2017-07-05T02:22:23.000Z

same issue as @Rivrr . Change mul to multiply, re-order concat params, tf-1.2.1, no luck

Answer 14 · 2017-07-05T02:30:05.000Z

I replaced the code from"real_time_face_recognition"

print('建立facenet embedding模型')
tf.Graph().as_default()
sess = tf.Session()
images_placeholder = tf.placeholder(tf.float32, shape=(batch_size,
image_size,
image_size, 3), name='input')

phase_train_placeholder = tf.placeholder(tf.bool, name='phase_train')

embeddings = network.inference(images_placeholder, pool_type,
use_lrn,
1.0,
phase_train=phase_train_placeholder)

ema = tf.train.ExponentialMovingAverage(1.0)
saver = tf.train.Saver(ema.variables_to_restore())
#ckpt = tf.train.get_checkpoint_state(os.path.expanduser(model_dir))
#saver.restore(sess, ckpt.model_checkpoint_path)

model_checkpoint_path='./model_check_point/model-20160506.ckpt-500000'
#ckpt = tf.train.get_checkpoint_state(os.path.expanduser(model_dir))
#model_checkpoint_path='model-20160506.ckpt-500000'

#saver.restore(sess, ckpt.model_checkpoint_path)
saver.restore(sess, model_checkpoint_path)
print('facenet embedding模型建立完毕')

with

    print('建立facenet embedding模型')

    images_placeholder = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, 3), name='input')

    phase_train_placeholder = tf.placeholder(tf.bool, name='phase_train')

    batch_norm_params = {
        # Decay for the moving averages.
        'decay': 0.995,
        # epsilon to prevent 0s in variance.
        'epsilon': 0.001,
        # force in-place updates of mean and variance estimates
        'updates_collections': None,
        # Moving averages ends up in the trainable variables collection
        'variables_collections': [tf.GraphKeys.TRAINABLE_VARIABLES],
    }

    # Build the inference graph
    prelogits, _ = network.inference(images_placeholder, 1.0,
                phase_train=phase_train_placeholder, bottleneck_layer_size=128,
                weight_decay=0.0)

    embeddings = tf.nn.l2_normalize(prelogits, 1, 1e-10, name='embeddings')

    model_dir = '/Users/qcy/Documents/GithubProjects/models/facenet/20170512-110547/'

    print('Model directory: %s' % model_dir)
    meta_file, ckpt_file = facenet.get_model_filenames(model_dir)

    print('Metagraph file: %s' % meta_file)
    print('Checkpoint file: %s' % ckpt_file)

    saver = tf.train.import_meta_graph(os.path.join(model_dir, meta_file))
    saver.restore(sess, os.path.join(model_dir, ckpt_file))

    print('facenet embedding模型建立完毕')

and there's no error
using tf1.0 and the most recent model of facenet

Answer 15 · 2017-07-05T11:49:58.000Z

@shanren7 , hi, I completed the compilation in tensorflow 0.8, but I modified a place, in detect_face.py
def pad(total_boxes, w, h)

compute the padding coordinates (pad the bounding boxes to square)

tmpw = (total_boxes[:,2]-total_boxes[:,0]+1).astype(np.int32)
tmph = (total_boxes[:,3]-total_boxes[:,1]+1).astype(np.int32)
numbox = total_boxes.shape[0]

dx = np.ones((numbox), dtype=np.int32)
dy = np.ones((numbox), dtype=np.int32)
edx = tmpw.copy().astype(np.int32)
edy = tmph.copy().astype(np.int32)

x = total_boxes[:,0].copy().astype(np.int32)
y = total_boxes[:,1].copy().astype(np.int32)
ex = total_boxes[:,2].copy().astype(np.int32)
ey = total_boxes[:,3].copy().astype(np.int32)

tmp = np.where(ex>w)
#edx[tmp] = np.expand_dims(-ex[tmp]+w+tmpw[tmp],1)
edx[tmp] = np.expand_dims(-ex[tmp]+w+tmpw[tmp],0)#1->0
ex[tmp] = w

tmp = np.where(ey>h)
#edy[tmp] = np.expand_dims(-ey[tmp]+h+tmph[tmp],1)
edy[tmp] = np.expand_dims(-ey[tmp]+h+tmph[tmp],0)#1->0
ey[tmp] = h

tmp = np.where(x<1)
#dx[tmp] = np.expand_dims(2-x[tmp],1)
dx[tmp] = np.expand_dims(2-x[tmp],0)#1->0
x[tmp] = 1

tmp = np.where(y<1)
#dy[tmp] = np.expand_dims(2-y[tmp],1)
dy[tmp] = np.expand_dims(2-y[tmp],0)#1->0
y[tmp] = 1

return dy, edy, dx, edx, y, ey, x, ex, tmpw, tmph

thank you.

Answer 16 · 2017-07-05T11:54:21.000Z

@Rivrr hello, thank you for your help. Can you tell me where can download the most recent model of facenet and does the model need conversion? What's the effect of identification?

Answer 17 · 2017-07-13T14:29:08.000Z

@Rivrr Hi, I changed the code as above.
But there was an error in some places.
For example, an error occurred in the 'bottleneck_layer_size = 128' part of 'network.inference ()' or in the import of 'import_meta_graph'.
As a result, I need to be more precise in referring to the latest model of 'facenet' (for example, if there is a modified part of nn4.py or facenet.py ..)
My development environment is as follows.
TF: tensorflow 1.0.0
Python: 3.5.3

Thank you.

Answer 18 · 2017-07-24T07:16:01.000Z

@bearsprogrammer
i get the same error with you after i changed the code as @Rivrr says
have you solved this??
thanks!

Answer 19 · 2017-07-24T08:03:21.000Z

@Rivrr thanks for your contribution,but after i replace the origin code with the code you mentioned above,i have error as follows:
File "/home/zzw/PROGRAM/real_time_face_recognition/nn4.py", line 12, in inference
conv1 = facenet.conv(images, 3, 64, 7, 7, 2, 2, 'SAME', 'conv1_7x7', phase_train=phase_train, use_batch_norm=True)
AttributeError: 'module' object has no attribute 'conv'

and i find the author's file"facenet.py" has a lot differences with the latest version,so can you send me your file?thanks a lot!!
here is my email:344184196@qq.com

Answer 20 · 2017-07-25T03:42:51.000Z

@tanjie860110 @bearsprogrammer @zzw1123
I'm sorry I've been so busy recently and can't reply immediately.
I don't know if it really works, because I didn't go to the last step, I just don't get error till this step.
The key is to download the newest version of the model that facenet provides and find the corresponding code in the project about "saver", because that's the part that extract the parameters from the trained model.

Answer 21 · 2017-07-26T06:28:10.000Z

@Rivrr @zzw1123 @shanren7
First of all, thanks to @Rivrr for providing us with ideas for resolving this error.
OK, i fixed these problem very perfect.
The point is 'latest facenet' project. i can't say whole of modified place in the code.. because, so many place are modified..
As soon as the code is cleaned up, I will upload the project to my github repository as soon as possible. please wait for me.
thanks!
This is my result video.
https://www.youtube.com/watch?v=T6czH6DLhC4

My development environment is as follows.
TF: tensorflow-gpu 1.2.1
Python: 3.5.3
OS: Windows 10 pro

Answer 22 · 2017-07-26T09:36:50.000Z

Hello, guys, I modify some places and it can work now. I upload the file real_time_face_detection_and_ recognition.ipynb modified to google drive. Here is the link: https://drive.google.com/open?id=0Bz8GEeGjsd9AZlJPNWd3RDFobTQ. Hope it can help you.
And note: 1. replace the file detect_face.py and facenet.py with the latest version from github of davidsandberg.
2. load the model of latest version 20170512-110547 (provided by davidsandberg)
3. use your own model path

My development environment is as follows:
tensorflow-gpu 1.2.1
python 2.7.13
ubuntu 16.04

Answer 23 · 2017-07-27T03:35:49.000Z

@JIEMIN1995 Thanks for your work! Another question is how to train a classifier together with real time recognition?Load the facenet model takes a long time!

Answer 24 · 2017-10-12T09:14:56.000Z

when I run the mtcnn to detect the face image, I met the error as below:
2017-10-13 00:39:02.833148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Found device 0 with properties: name: Tesla M40 major: 5 minor: 2 memoryClockRate(GHz): 1.112 pciBusID: 0000:82:00.0 totalMemory: 11.93GiB freeMemory: 11.82GiB 2017-10-13 00:39:02.833193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1055] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla M40, pci bus id: 0000:82:00.0, compute capability: 5.2) 2017-10-13 00:39:02.863038: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 11.93G (12808486912 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
My development environment is as follows:
tensorflow-gpu 1.2
python 2.7.9
ubuntu 14.04
The error was to say the 12 G memory was not enough for mtcnn detection. So how much memory does the model need to run? which GPU device do you use for this project. Anyone can tell me?
Thank you very much, wait for your reply @shanren7@xiaoxinyi @physicso @ten2net @MiloAnthony @Rivrr @JIEMIN1995 @tanjie860110

Answer 25 · 2017-10-14T14:18:46.000Z

@so-as Change "per_process_gpu_memory_fraction=[x]" to "per_process_gpu_memory_fraction=0.4"

Answer 26 · 2017-11-16T06:27:54.000Z

@xiaoxinyi @shanren7 This error can be solved by changing all the "use_batch_norm=True" to "use_batch_norm=False" in the nn4.py

NotFoundError: Tensor name

"incept3b/in4_conv1x1_15/batch_norm/cond/incept3b/in4_conv1x1_15/batch_norm/moments/moment

s_1/variance/ExponentialMovingAverage/biased" not found in checkpoint files

./model_check_point/model-20160506.ckpt-500000

Answer 27 · 2018-03-17T11:56:15.000Z

@JIEMIN1995 Thanks a lot for your modification! But I still have a warning says:
C:\Users\Administrator\Anaconda3\lib\site-packages\sklearn\utils\validation.py:395: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
I have no idea how to fix it. And also the detected box only stay for a really short time, can you tell me how to make it stay a little longer?
Thank you again!

Answer 28 · 2018-03-17T12:22:26.000Z

If you get this repo by using git clone, the model file is just 13KB rather than 80M.
You must click into the model page here https://github.com/yobibyte/yobiface/blob/master/model/model-20160506.ckpt-500000 and click Download button to download it completely.

Answer 29 · 2018-03-24T02:17:58.000Z

@JIEMIN1995 Thansk for your contribution, But I still have a error:
error: /io/opencv/modules/imgproc/src/resize.cpp:4044: error: (-215) ssize.width > 0 && ssize.height > 0 in function resize

Answer 30 · 2018-03-24T02:38:08.000Z

@airukongqi The reason of this error is that your opencv function videocapture() didn't capture any frames.Make sure your camera is working and check your opencv.

Answer 31 · 2018-03-24T07:18:21.000Z

@DawnHH Thanks for your answer,I moidfied the image_size from 96 to 160, and that's ok.but the other error happened:
ValueError: Expected 2D array, got 1D array instead:......
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Answer 32 · 2018-03-24T07:27:02.000Z

@airukongqi I also have that reshape order. But mine is a warning instead of an error.Actually I found that facenet also have a real_time_face_recognition.py in contributed folder and that works better.May be you can have a try with it. https://github.com/davidsandberg/facenet

Answer 33 · 2018-03-24T07:54:20.000Z

@DawnHH Thanks for your help, I use https://github.com/davidsandberg/facenet/contributed/real_time_face_recognition.py test ok.

Answer 34 · 2018-03-28T03:04:25.000Z

how to get feature vector? please, I want to learn...

Answer 35 · 2018-04-24T08:08:37.000Z

facenet network output face feature vector @MiloAnthony

Answer 36 · 2018-05-22T03:07:14.000Z

ValueError: shape mismatch: value array of shape (0,1) could not be broadcast to indexing result of shape (0,)
how to solve this problems? who has the same questiones??

Answer 37 · 2018-08-13T09:53:52.000Z

Thanks for your answer @tanjie860110,work done