gcloud ai-platform fails with TFv2 Saved Model

Question

gcloud ai-platform fails with TFv2 Saved Model

ehennis opened this issue 6 years ago · 6 comments

Describe the bug
A clear and concise description of what the bug is. Be sure to convey here whether it occurred locally or on the server (AI Platform, Google Dataflow)

I am following this tutorial: https://cloud.google.com/ml-engine/docs/tensorflow/deploying-models and using a model I created in TFv2. I am able to create the Saved_Model in my Colab using the following code:

import time
saved_model_path = "/content/gdrive/My Drive/Colab Notebooks/{}".format(int(time.time()))
tf.keras.experimental.export_saved_model(restored_model, saved_model_path)

This created a folder with the PB file and assets and variables folder. I then upload that to my bucket in the Google Cloud. I then ran the command to predict:
gcloud ai-platform local predict --model-dir=$MODEL_DIR --text-instances ci.txt --framework TENSORFLOW

The ci.txt is a comma separated list of my input numbers.
I get the following error that shows my values:

Traceback (most recent call last):
File "lib/googlecloudsdk/command_lib/ml_engine/local_predict.py", line 184, in
main()
File "lib/googlecloudsdk/command_lib/ml_engine/local_predict.py", line 179, in main
signature_name=args.signature_name)
File "/google/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/prediction_lib.py", line 102, in local_predict
predictions = model.predict(instances, signature_name=signature_name)
File "/google/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/prediction_utils.py", line 268, in predict
preprocessed, stats=stats, **kwargs)
File "/google/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/frameworks/tf_prediction_lib.py", line 363, in predict
"Exception during running the graph: " + str(e))
cloud.ml.prediction.prediction_utils.PredictionError: Failed to run the provided model: Exception during running the graph: invalid literal for float(): 81,71,80,76,1,3 (Error code: 2)

What sample is this bug related to?
Not sure how to answer this question. I listed the tutorial I followed above.

Source code / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

Here is the model: ver1.zip

To Reproduce
Steps to reproduce the behavior:

Train a model using TFv2 and Keras
Export using tf.keras.experimental.export_saved_model(..)
Upload to the AI-Engine
See error

Expected behavior
A clear and concise description of what you expected to happen.
I expect to be able to use the system and call into the model for predictions.

System Information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 and Google Colab
Framework and version (Tensorflow, scikit-learn, XGBoost): TFv2 Alpha (v1.12.0-9492-g2c319fb415 2.0.0-alpha0)
Python version: 3.6.7
Exact command to reproduce: gcloud ai-platform local predict --model-dir=$MODEL_DIR --text-instances ci.txt --framework TENSORFLOW
Tensorflow Transform environment (if applicable, see below):

To obtain the Tensorflow and Tensorflow Transform environment do

pip freeze |grep tensorflow
pip freeze |grep apache-beam

Additional context
Add any other context about the problem here.

Full Error

ERROR: (gcloud.ai-platform.local.predict) 2019-05-14 10:19:23.674836: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-05-14 10:19:23.687172: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2019-05-14 10:19:23.687418: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x556c01ae34a0 executing computations on platform Host. Devices:
2019-05-14 10:19:23.687444: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
WARNING:tensorflow:From /google/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/frameworks/tf_prediction_lib.py:210: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
WARNING:tensorflow:From /google/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/frameworks/tf_prediction_lib.py:210: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
WARNING:root:Error updating signature __saved_model_init_op: The name 'init_1' refers to an Operation, not a Tensor. Tensor names must be of the form "<op_name>:<output_index>".
ERROR:root:Exception during running the graph: invalid literal for float(): 81,71,80,76,1,3
Traceback (most recent call last):
File "lib/googlecloudsdk/command_lib/ml_engine/local_predict.py", line 184, in
main()
File "lib/googlecloudsdk/command_lib/ml_engine/local_predict.py", line 179, in main
signature_name=args.signature_name)
File "/google/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/prediction_lib.py", line 102, in local_predict
predictions = model.predict(instances, signature_name=signature_name)
File "/google/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/prediction_utils.py", line 268, in predict
preprocessed, stats=stats, **kwargs)
File "/google/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/frameworks/tf_prediction_lib.py", line 363, in predict
"Exception during running the graph: " + str(e))
cloud.ml.prediction.prediction_utils.PredictionError: Failed to run the provided model: Exception during running the graph: invalid literal for float(): 81,71,80,76,1,3 (Error code: 2)

Answer 1 · 2019-06-10T22:52:41.000Z

We currently do not support TF 2.0 as part of predictions. These are the supported versions for AI Platform: https://cloud.google.com/ml-engine/docs/tensorflow/runtime-version-list

I would suggest you use a Deep Learning VM image with TF serving.

Answer 2 · 2019-06-12T00:48:38.000Z

@ehennis I will keep this issue opened in order to provide an update as to when we will support this.

Answer 3 · 2019-06-28T06:35:37.000Z

Testing internally will update soon

Answer 4 · 2019-07-14T18:38:38.000Z

Is there any update for when tf2 will be supported

Answer 5 · 2019-08-26T07:45:20.000Z

Once it goes General Availbility, closing for now

Answer 6 · 2019-09-23T21:54:26.000Z

Gonzalos meant to close this issue (see prior comment), but overlooked it. Closing it now.