custom_op (Registered only GPU kernel) failed to load
Opened this issue · 1 comments
jb892 commented
Hi,
I'm new to tensorflow serving. I'm trying to serving my trained model via simple_tensorflow_serving. However, after I run next line command, it failed to recognize the custom_ops that only registed with GPU kernels.
simple_tensorflow_serving --model_base_path="./models/pointnet2_sem_seg/" --custom_op_paths="./custom_ops/" --session_config='{"log_device_placement": true, "allow_soft_placement": true, "allow_growth": true, "per_process_gpu_memory_fraction": 0.5}'
Result:
2019-04-25 10:26:55 INFO custom_op_paths: ./custom_ops/
2019-04-25 10:26:55 INFO debug: False
2019-04-25 10:26:55 INFO enable_cors: True
2019-04-25 10:26:55 INFO model_config_file:
2019-04-25 10:26:55 INFO host: 0.0.0.0
2019-04-25 10:26:55 INFO secret_key: secret.key
2019-04-25 10:26:55 INFO model_name: default
2019-04-25 10:26:55 INFO port: 8500
2019-04-25 10:26:55 INFO enable_auth: False
2019-04-25 10:26:55 INFO model_platform: tensorflow
2019-04-25 10:26:55 INFO reload_models: False
2019-04-25 10:26:55 INFO enable_colored_log: False
2019-04-25 10:26:55 INFO log_level: info
2019-04-25 10:26:55 INFO auth_username: admin
2019-04-25 10:26:55 INFO auth_password: admin
2019-04-25 10:26:55 INFO model_base_path: ./models/pointnet2_sem_seg/
2019-04-25 10:26:55 INFO gen_client:
2019-04-25 10:26:55 INFO bind: 0.0.0.0:8500
2019-04-25 10:26:55 INFO session_config: {"log_device_placement": true, "allow_soft_placement": true, "allow_growth": true, "per_process_gpu_memory_fraction": 0.5}
2019-04-25 10:26:55 INFO download_inference_images: True
2019-04-25 10:26:55 INFO secret_pem: secret.pem
2019-04-25 10:26:55 INFO enable_ssl: False
2019-04-25 10:26:55 INFO Load the so file from: ./custom_ops/tf_grouping_so.so
2019-04-25 10:26:55 INFO Load the so file from: ./custom_ops/tf_interpolate_so.so
2019-04-25 10:26:55 INFO Load the so file from: ./custom_ops/tf_sampling_so.so
2019-04-25 10:26:55.137247: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
2019-04-25 10:26:55.140876: I tensorflow/core/common_runtime/direct_session.cc:307] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
2019-04-25 10:26:55 INFO Put the model version: 1 online, path: ./models/pointnet2_sem_seg/1
INFO:tensorflow:Restoring parameters from ./models/pointnet2_sem_seg/1/variables/variables
2019-04-25 10:26:55 INFO Restoring parameters from ./models/pointnet2_sem_seg/1/variables/variables
Traceback (most recent call last):
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _run_fn
self._extend_graph()
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1352, in _extend_graph
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'FarthestPointSample' with these attrs. Registered devices: [CPU,XLA_CPU], Registered kernels:
device='GPU'
[[{{node layer1/FarthestPointSample}} = FarthestPointSample[npoint=1024, _device="/device:GPU:0"](input)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1546, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'FarthestPointSample' with these attrs. Registered devices: [CPU,XLA_CPU], Registered kernels:
device='GPU'
[[node layer1/FarthestPointSample (defined at /home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py:175) = FarthestPointSample[npoint=1024, _device="/device:GPU:0"](input)]]
Caused by op 'layer1/FarthestPointSample', defined at:
File "/home/jake/anaconda3/envs/PointNetPP/bin/simple_tensorflow_serving", line 7, in <module>
from simple_tensorflow_serving.server import main
File "<frozen importlib._bootstrap>", line 968, in _find_and_load
File "<frozen importlib._bootstrap>", line 957, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 697, in exec_module
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/server.py", line 252, in <module>
session_config)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 72, in __init__
self.load_saved_model_version(model_version)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 175, in load_saved_model_version
session, [tf.saved_model.tag_constants.SERVING], model_file_path)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 197, in load
return loader.load(sess, tags, import_scope, **saver_kwargs)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 350, in load
**saver_kwargs)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 278, in load_graph
meta_graph_def, import_scope=import_scope, **saver_kwargs)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1696, in _import_meta_graph_with_return_elements
**kwargs))
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/meta_graph.py", line 806, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 442, in import_graph_def
_ProcessNewOps(graph)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 234, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3440, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3440, in <listcomp>
for c_op in c_api_util.new_tf_operations(self)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3299, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'FarthestPointSample' with these attrs. Registered devices: [CPU,XLA_CPU], Registered kernels:
device='GPU'
[[node layer1/FarthestPointSample (defined at /home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py:175) = FarthestPointSample[npoint=1024, _device="/device:GPU:0"](input)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jake/anaconda3/envs/PointNetPP/bin/simple_tensorflow_serving", line 7, in <module>
from simple_tensorflow_serving.server import main
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/server.py", line 252, in <module>
session_config)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 72, in __init__
self.load_saved_model_version(model_version)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 175, in load_saved_model_version
session, [tf.saved_model.tag_constants.SERVING], model_file_path)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 197, in load
return loader.load(sess, tags, import_scope, **saver_kwargs)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 351, in load
self.restore_variables(sess, saver, import_scope)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 303, in restore_variables
saver.restore(sess, self._variables_path)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1582, in restore
err, "a mismatch between the current graph and the graph")
tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
No OpKernel was registered to support Op 'FarthestPointSample' with these attrs. Registered devices: [CPU,XLA_CPU], Registered kernels:
device='GPU'
[[node layer1/FarthestPointSample (defined at /home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py:175) = FarthestPointSample[npoint=1024, _device="/device:GPU:0"](input)]]
Caused by op 'layer1/FarthestPointSample', defined at:
File "/home/jake/anaconda3/envs/PointNetPP/bin/simple_tensorflow_serving", line 7, in <module>
from simple_tensorflow_serving.server import main
File "<frozen importlib._bootstrap>", line 968, in _find_and_load
File "<frozen importlib._bootstrap>", line 957, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 697, in exec_module
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/server.py", line 252, in <module>
session_config)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 72, in __init__
self.load_saved_model_version(model_version)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 175, in load_saved_model_version
session, [tf.saved_model.tag_constants.SERVING], model_file_path)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 197, in load
return loader.load(sess, tags, import_scope, **saver_kwargs)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 350, in load
**saver_kwargs)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 278, in load_graph
meta_graph_def, import_scope=import_scope, **saver_kwargs)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1696, in _import_meta_graph_with_return_elements
**kwargs))
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/meta_graph.py", line 806, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 442, in import_graph_def
_ProcessNewOps(graph)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 234, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3440, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3440, in <listcomp>
for c_op in c_api_util.new_tf_operations(self)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3299, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
No OpKernel was registered to support Op 'FarthestPointSample' with these attrs. Registered devices: [CPU,XLA_CPU], Registered kernels:
device='GPU'
[[node layer1/FarthestPointSample (defined at /home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py:175) = FarthestPointSample[npoint=1024, _device="/device:GPU:0"](input)]]
Have anyone has come across this issue? What should I do next?
jb892 commented
It seems that the GPU is not activated during restoring from checkpoint, right?