mrdbourke/tensorflow-deep-learning

Notebook 05: load_weights results in "Incompatible tensor with shape (1280, 10)..."

ivanthecrazy opened this issue ยท 23 comments

When creatingmodel_2 and trying to load the weights by

model_2.load_weights(checkpoint_path)

I'm getting the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-55-d2e3006b884f>](https://localhost:8080/#) in <cell line: 2>()
      1 # Load model from checkpoint, that way we can fine-tune from the same stage the 10 percent data model was fine-tuned from
----> 2 model_2.load_weights(checkpoint_path) # revert model back to saved weights

1 frames
[/usr/local/lib/python3.9/dist-packages/tensorflow/python/ops/resource_variable_ops.py](https://localhost:8080/#) in _restore_from_tensors(self, restored_tensors)
    718             self.handle, self.shape, restored_tensor)
    719       except ValueError as e:
--> 720         raise ValueError(
    721             f"Received incompatible tensor with shape {restored_tensor.shape} "
    722             f"when attempting to restore variable with shape {self.shape} "

ValueError: Received incompatible tensor with shape (1280, 10) when attempting to restore variable with shape (1, 1, 1152, 48) and name Adam/m/block7a_se_reduce/kernel:0.

I tried to download the notebook from this repo, but have the same result.

Hi @ivanthecrazy,

Investigating this issue myself.

I'm going through the following resources:

Looks like it's an issue with newer versions of TensorFlow and tf.keras.applications.efficientnet models and using the load_weights() method.

My current solution is installing TensorFlow 2.9.0 (as suggested by the links above) and running it from there.

For example:

# Install TensorFlow 2.9.0 to avoid issues (later versions may work)
# -U stands for "update" and "-q" stands for "quiet"
!pip install -U -q tensorflow==2.9.0
import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")

I will make sure this works and investigate it further if something is wrong.

I'll post another comment here once I've fixed the notebook: https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/05_transfer_learning_in_tensorflow_part_2_fine_tuning.ipynb

Update: I've confirmed that running notebook 05 works end-to-end with TensorFlow 2.9.0 (as per the links above).

Install TensorFlow 2.9.0 with:

# Install TensorFlow 2.9.0 to avoid issues (later versions may work)
# -U stands for "update" and "-q" stands for "quiet"
!pip install -U -q tensorflow==2.9.0
import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")

I'm not quite sure what's happening with later versions (e.g. 2.10.0+), the issues above seem to be long standing.

The notebook code has been updated to reflect installing TensorFlow 2.9.0 at the start.

See the updated code here and let me know how it goes: https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/05_transfer_learning_in_tensorflow_part_2_fine_tuning.ipynb

I had the same problem. I can verify that the new Daniel's notebook (tf version 2.9.0) works fine. Furthermore all these โ€œModel failed to serialize as JSONโ€ warnings, while fitting the various models have been disappeared.

Hi @filipposkar , glad to hear you got it fixed!

Looks like this should also be fixed further in upcoming versions of TensorFlow (e.g. 2.13+).

For now, it looks like TensorFlow 2.9.0 works.

See this comment here: keras-team/tf-keras#383

Update: looks like TensorFlow 2.9.0 is still the most stable here, see: #553

TL;DR tried tf-nightly(2.14.0-dev20230520) and it still broke.

i have issiue with changing verion of tensorflow, '!pip install -U -q tensorflow==2.9.0' doesn't work

@OFALOFAL

Here is my temporary work around.

The contributing factor seems to be stemming from line of code at [29] from @mrdbourke 05_transfer_learning where the install of TF is upgrading to latest version of TF, however, some of the dependencies are deprecated in the latest version of TF since we are working with tensorflow 2.9.0.

  1. Removed import tensorflow as tf from block [29] of 05_transfer_learning github

Screenshot 2023-07-01 at 6 18 29 PM

  1. Scroll to top of your code to block [1] and use
  • !pip uninstall -y tensorflow to remove the 2.12.x version

Screenshot 2023-07-01 at 6 24 12 PM

  1. Insert the tensorflow==2.9.0 install and import in block [2]
  • !pip install -U -q tensorflow==2.9.0
  • import tensorflow as tf
  • print(tf.version)
  • from tensorflow import keras

Screenshot 2023-07-01 at 6 24 18 PM

**(notes on protobuf below as the dependency is incompatible; however, the results have compiled the same as predicted.)

  1. I cleared all outputs and compiled the code from the beginning.

Screenshot 2023-07-01 at 6 14 46 PM

The Protobuf dependency used in TensorFlow is used to serialize and deserialize data. This means that it can be used to convert data from one format to another, such as from a Python object to a binary file. This is useful for TensorFlow because it allows models to be saved and loaded easily, and it also allows for communication between different TensorFlow components.

Specifically, Protobuf is used in TensorFlow for the following purposes:

  • Serializing and deserializing TensorFlow models: When a TensorFlow model is saved, it is serialized into a Protobuf file. This file can then be loaded back into TensorFlow to restore the model.

  • Communicating between different TensorFlow components: TensorFlow components, such as the TensorFlow Serving server and the TensorFlow Lite library, use Protobuf to communicate with each other. This allows them to exchange data in a format that is both efficient and easy to understand.

  • Providing a common data format for TensorFlow and other libraries: Protobuf is a widely used data format, so it can also be used to communicate with other libraries that use Protobuf. This makes it easier to integrate TensorFlow with other libraries, such as the gRPC RPC framework.

Overall, the Protobuf dependency used in TensorFlow is a valuable tool that allows TensorFlow models to be saved, loaded, and communicated with other components. It is a versatile data format that is widely used in the industry, and it makes TensorFlow more accessible to other libraries and frameworks. - Source: Bard

Your version of protobuf will most likely result in errors with tensorflow-datasets
It requires a much more recent version. The issue is that it requires a module called builder.py that's not present in version 3.19.x
The best workaround for that so far is to force reinstall protobuf=3.20.3 using pip install --force-reinstall "protobuf=3.20.3". Pip will complain about incompatibilities left and right but I've found it to work without issues so far with tf 2.9 to 2.12 with tensorflow-datasets and other libraries.

@OFALOFAL

Here is my temporary work around.

The contributing factor seems to be stemming from line of code at [29] from @mrdbourke 05_transfer_learning where the install of TF is upgrading to latest version of TF, however, some of the dependencies are deprecated in the latest version of TF since we are working with tensorflow 2.9.0.

  1. Removed import tensorflow as tf from block [29] of 05_transfer_learning github

Screenshot 2023-07-01 at 6 18 29 PM

  1. Scroll to top of your code to block [1] and use
  • !pip uninstall -y tensorflow to remove the 2.12.x version

Screenshot 2023-07-01 at 6 24 12 PM

  1. Insert the tensorflow==2.9.0 install and import in block [2]
  • !pip install -U -q tensorflow==2.9.0
  • import tensorflow as tf
  • print(tf.version)
  • from tensorflow import keras

Screenshot 2023-07-01 at 6 24 18 PM

**(notes on protobuf below as the dependency is incompatible; however, the results have compiled the same as predicted.)

  1. I cleared all outputs and compiled the code from the beginning.

Screenshot 2023-07-01 at 6 14 46 PM

The Protobuf dependency used in TensorFlow is used to serialize and deserialize data. This means that it can be used to convert data from one format to another, such as from a Python object to a binary file. This is useful for TensorFlow because it allows models to be saved and loaded easily, and it also allows for communication between different TensorFlow components.

Specifically, Protobuf is used in TensorFlow for the following purposes:

  • Serializing and deserializing TensorFlow models: When a TensorFlow model is saved, it is serialized into a Protobuf file. This file can then be loaded back into TensorFlow to restore the model.
  • Communicating between different TensorFlow components: TensorFlow components, such as the TensorFlow Serving server and the TensorFlow Lite library, use Protobuf to communicate with each other. This allows them to exchange data in a format that is both efficient and easy to understand.
  • Providing a common data format for TensorFlow and other libraries: Protobuf is a widely used data format, so it can also be used to communicate with other libraries that use Protobuf. This makes it easier to integrate TensorFlow with other libraries, such as the gRPC RPC framework.

Overall, the Protobuf dependency used in TensorFlow is a valuable tool that allows TensorFlow models to be saved, loaded, and communicated with other components. It is a versatile data format that is widely used in the industry, and it makes TensorFlow more accessible to other libraries and frameworks. - Source: Bard

Hi @mrdbourke .

I run the line suggested,

!pip uninstall -y tensorflow
!pip install -U -q tensorflow==2.9.0
import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")

But it showed this.

Found existing installation: tensorflow 2.9.0
Uninstalling tensorflow-2.9.0:
  Successfully uninstalled tensorflow-2.9.0
WARNING: Ignoring invalid distribution -ensorflow (/usr/local/lib/python3.10/dist-packages)
WARNING: Ignoring invalid distribution -ensorflow (/usr/local/lib/python3.10/dist-packages)
TensorFlow version: 2.12.0

Hi @mrdbourke .

I run the line suggested,

!pip uninstall -y tensorflow
!pip install -U -q tensorflow==2.9.0
import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")

But it showed this.

Found existing installation: tensorflow 2.9.0
Uninstalling tensorflow-2.9.0:
  Successfully uninstalled tensorflow-2.9.0
WARNING: Ignoring invalid distribution -ensorflow (/usr/local/lib/python3.10/dist-packages)
WARNING: Ignoring invalid distribution -ensorflow (/usr/local/lib/python3.10/dist-packages)
TensorFlow version: 2.12.0

Did you restart the runtime? Iirc tensorflow tells you it will only take effect after restarting it

Hi all,

After much troubleshooting, I've found the best fix for tf.keras.applications.EfficientNetB0 problems is to simply upgrade to tf.keras.applications.efficientnet_v2.EfficientNetV2B0.

You can see a full write-up of the fix here: #575

It worked for me if i recompile the model before loading weights it may be because the model was training and it changed some layers and the tensor shape was no longer compatible

@talha-0 Great catch! Thank you for the update!

I got the same issue with trying to customize my model for Image Classification.
I noticed that it worked the first time but after I got this error.
After deleting the export model folder each time I do the training, it works, even with Tensorflow=2.11.0

Hi all,

After much troubleshooting, I've found the best fix for tf.keras.applications.EfficientNetB0 problems is to simply upgrade to tf.keras.applications.efficientnet_v2.EfficientNetV2B0.

You can see a full write-up of the fix here: keras-team/keras#575

I tried the solution here but it doesn't seem to work for me

Hi all,
After much troubleshooting, I've found the best fix for tf.keras.applications.EfficientNetB0 problems is to simply upgrade to tf.keras.applications.efficientnet_v2.EfficientNetV2B0.
You can see a full write-up of the fix here: keras-team/keras#575

I tried the solution here but it doesn't seem to work for me

Oh dam!

What error are you getting now?

Did you try to reference the updated Notebook 05? See: https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/05_transfer_learning_in_tensorflow_part_2_fine_tuning.ipynb

I recompiled the model:

model_2.compile(loss='categorical_crossentropy',
                optimizer=tf.keras.optimizers.Adam(),
                metrics='accuracy')

and got rid of .ckpt from the checkpoint_path:
checkpoint_path = 'ten_percent_model_checkpoints_weights/checkpoint'
it just works perfectly fine now.

Using tf.keras.applications.efficientnet_v2.EfficientNetV2B0 didn't work for me, neither using other versions of tensorflow. It only works if I compile the model again before loading weights. If I leave the .ckpt extension or not in the checkpoint path does not affect the result, I think.

I got the similar error when I tried to load the best model from the keras tuner. I'm using a custom transformer model and the tuning works fine.

image

I also tested if I don't create a new tuner instance with the same parameter (except 'overwrite=False') but use the tuner instance created for fine-tuning, I don't get the error anymore but this time I'm required to provide input_shape for model.build
image

getting the same error in 2024 as well "ValueError: Received incompatible tensor with shape (1280, 10) when attempting to restore variable with shape (1, 1, 1152, 48) and name Adam/m/block6h_se_reduce/kernel:0.", i tried downloading the 2.9 version but it doesnt work, any help @mrdbourke?

Actually this issue caused because model is recompiling between weights are saved and loaded.
in other words we are trying to load weights in slightly different model (with unlocked layers of base model).
quite obvious solution is - recreate model from scratch and load weights once again (and unlock layers once again if needed)