keras-team/tf-keras

efficientnetBx model.save() fails due to serialization problem with tf2.10.0

jeromemassot opened this issue Β· 20 comments

System information.

  • Have I written custom code: derived from Keras image_classification_efficientnet_fine_tuning
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Win10Pro
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): tf2.10.0
  • Python version: 3.10.2
  • Bazel version (if compiling from source):
  • GPU model and memory: NVIDIA RTX TITAN 24Gb
  • Exact command to reproduce: model.save()

Describe the problem clearly here. Be sure to convey here why it's a bug in Keras or why the requested feature is needed.

Describe the current behavior.
model save() fails and reports a serialization problem.

Describe the expected behavior.
saving keras model without error.

Contributing.

  • Do you want to contribute a PR? (yes/no): no

Source code / logs.

WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 273). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: ./models/EfficientNetB7_Naiads.h5py\assets
INFO:tensorflow:Assets written to: ./models/EfficientNetB7_Naiads.h5py\assets
Output exceeds the size limit. Open the full output data in a text editor

TypeError Traceback (most recent call last)
Cell In [31], line 1
----> 1 model.save('./models/EfficientNetB7_Naiads.h5py')

File e:\02- Vision Projects\01- Naiads Projects\notebooks.venv\lib\site-packages\keras\utils\traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.traceback)
68 # To get the full stack trace, call:
69 # tf.debugging.disable_traceback_filtering()
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb

File C:\Python310\lib\json\encoder.py:199, in JSONEncoder.encode(self, o)
195 return encode_basestring(o)
196 # This doesn't pass the iterator directly to ''.join() because the
197 # exceptions aren't as detailed. The list call should be roughly
198 # equivalent to the PySequence_Fast that ''.join() would do.
--> 199 chunks = self.iterencode(o, _one_shot=True)
200 if not isinstance(chunks, (list, tuple)):
201 chunks = list(chunks)

File C:\Python310\lib\json\encoder.py:257, in JSONEncoder.iterencode(self, o, _one_shot)
252 else:
253 _iterencode = _make_iterencode(
...
255 self.key_separator, self.item_separator, self.sort_keys,
256 self.skipkeys, _one_shot)
--> 257 return _iterencode(o, 0)

TypeError: Unable to serialize [2.0897 2.1129 2.1082] to JSON. Unrecognized type <class 'tensorflow.python.framework.ops.EagerTensor'>.

@jeromemassot
In order to expedite the trouble-shooting process, please provide a code snippet to reproduce the issue reported here. Thank you!

Hi @sushreebarsa Thanks for your reply. The code that I am using is the carbon-copy of this one available in the Keras example. https://github.com/keras-team/keras-io/blob/master/examples/vision/image_classification_efficientnet_fine_tuning.py

Two tiny differences:

1- The fit() method is using an early stopping callback as follows:

early_stopping_callback = tf.keras.callbacks.EarlyStopping(
monitor='val_loss', min_delta=0.05,
patience=3, restore_best_weights=True
)

hist = model.fit(
train_ds, epochs=epochs,
steps_per_epoch=train_steps_per_epoch,
validation_data=validation_ds,
validation_steps=validation_steps_per_epoch,
callbacks = [early_stopping_callback],
verbose=1
)

2- the model.save('./models/EfficientNetB7_xx.h5py') after the training is completed. This last command creates the serialization problem.

Thanks for your help.
Best regards
Jerome

@sushreebarsa, were you able to replicate? I don't see a gist

Hi @jbischof,

Iam unable to replicate the issue with exact code mentioned by @jeromemassot due to TPU issue with my Colab and also its a large model.But iam pretty sure that this is due to Serialization problem with efficientnet Model from tf.keras.applications.efficientnet. These models works fine with 2.9.2 version and having serialization issue from 2.10V and tf-nightly versions. Please refer the attached gist with minimal code to replicate the problem. The models saves without error if we use tf.saved_model.save() instead of model.save().Refer gist1 replicating issue with 2.10 & nightly versions and gist2 replicating issue for all efficientnet models.

All the above tested models including this particular issue #383 have same serialization error:
TypeError: Unable to serialize [2.0896919 2.1128857 2.1081853] to JSON. Unrecognized type <class 'tensorflow.python.framework.ops.EagerTensor'>
Even there is no change in values of Serialize List.

@jeromemassot, Could you also please cross check whether the model saving works with 2.9.2 version and alternatively with tf.saved_model.save().

I get the exact same error with even the same numbers in the array which are not serializable. Downgrading tf doesn't work for me because of other dependencies... Anyone got any ideas?

@SuryanarayanaY
Saving with tf.saved_model.save() works, however loading only works with tf.saved_model.load() which makes it pretty much useless, as it doesn't come as a keras object.

Trying to load the model with keras.models.load_model() results in a Value Error: Unable to create a Keras model from SavedModel at D:\model_weights\EfficientNet. This SavedModel was exported with 'tf.saved_model.save', and lacks the Keras metadata file. Please save your Keras model by calling 'model.save' or 'tf.keras.models.save_model'. Note that you can still load this SavedModel with 'tf.saved_model.load'.

Same problem here:

import tensorflow as tf
model = tf.keras.applications.efficientnet.EfficientNetB0()
model.save('model.json')

->

Traceback (most recent call last):
  File "//main.py", line 4, in <module>
    model.save('model.json')
  File "/usr/local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/usr/local/lib/python3.10/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/lib/python3.10/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
TypeError: Unable to serialize [2.0896919 2.1128857 2.1081853] to JSON. Unrecognized type <class 'tensorflow.python.framework.ops.EagerTensor'>.

Here is a Dockerfile to easily reproduce it:

FROM python:3.10.8

RUN apt-get update
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get install -y build-essential cmake

RUN pip3 install tensorflow==2.10.0

RUN echo "\n\
import tensorflow as tf\n\
model = tf.keras.applications.efficientnet.EfficientNetB0()\n\
model.save('model.json')\n\
" > main.py

RUN python3 main.py

Also facing this issue.

Encountered an identical issue with EfficientNetB1 and tensorflow-macos==2.10.0

Got the exact same issue with TF 2.10 and TF 2.11 (tried to save model in saved_model format and in H5 format, neither worked). Downgrading to TF2.9 helped for now but it would be really nice to be able to use TF more recent versions' features

NB if useful:

This issue seems to be triggered by the fact the hard-baked normalisation constants are getting evaluated to an EagerTensor before the scaling layer is built - see here

I fixed this locally by moving the logic into python:

At the top:

IMAGENET_STDDEV_RGB = [0.229, 0.224, 0.225]
IMAGENET_STDDEV_RGB = [1/math.sqrt(i) for i in IMAGENET_STDDEV_RGB]

Then on build just do:
x = layers.Rescaling(IMAGENET_STDDEV_RGB)(x)

Don't have time to a raise a PR rn /w failure test cases (also suspect this isn't the most elegant solution), but thought at least a guide to hotfix might help for anyone that does!

Thanks @hctomkins

So, is the TF team in the process to commit this fix in the future version of the EfficientNet code or should we continue to use this local fix in our codes?

Thanks
Best regards
Jerome

Thanks @hctomkins
Based on your suggestion, for now we implemented the following workaround in our Dockerfile:

# Workaround for TF 2.10 & 2.11 issue: https://github.com/keras-team/tf-keras/issues/383: "efficientnetBx model.save() fails due to serialization problem with tf2.10.0"
RUN sudo sed -i 's/IMAGENET_STDDEV_RGB = \[0.229, 0.224, 0.225\]/IMAGENET_STDDEV_RGB = \[1 \/ math.sqrt(i) for i in \[0.229, 0.224, 0.225\]\]/g' \
    /usr/local/lib/python3.8/dist-packages/keras/applications/efficientnet.py && \
    sudo sed -i 's/x = layers.Rescaling(1.0 \/ tf.math.sqrt(IMAGENET_STDDEV_RGB))(x)/x = layers.Rescaling(IMAGENET_STDDEV_RGB)(x)/g' \
    /usr/local/lib/python3.8/dist-packages/keras/applications/efficientnet.py

Same here: Unable to serialize [2.0896919 2.1128857 2.1081853] to JSON
in 2.11.0.
Easy to reproduce as I've followed simply the "Load video data" tutorial and used the keras model at the bottom.
https://www.tensorflow.org/tutorials/load_data/video#next_steps
then just add model.save() at the end and it will crash. Ironically, saving as a TF-lite model certainly works!

I can see why people moved to PyTorch.

Are you satisfied with the resolution of your issue?
Yes
No

Apply keras-team/keras@5b931e6 fix manually (as @hctomkins mentioned)

location: lib/python3.10/site-packages/keras/applications/efficientnet.py (py3.10)

EDIT this:

x = layers.Rescaling(1.0 / tf.math.sqrt(IMAGENET_STDDEV_RGB))(x)

TO:

x = layers.Rescaling(
    [1.0 / math.sqrt(stddev) for stddev in IMAGENET_STDDEV_RGB]
)(x)

@jeromemassot ,

The PR merged to Master branch. I have tested the code with latest tf-nightly(2.13.0-dev20230409) and there is no error now. Please refer to attached gist.

If anybody still faces issue in tf-nightly please let us know.

It seems the commit not cherry picked to latest versions. I will convey it to concern team and let you know the status whether it can be cherry picked to TF2.12 version.Till then users requested to use tf-nightly.

Thanks!

just use efficientnetv2. it works like charm and does not have the same problem and it has better performance anyway