Notebook 05: load_weights results in "Incompatible tensor with shape (1280, 10)..."
ivanthecrazy opened this issue ยท 23 comments
When creatingmodel_2
and trying to load the weights by
model_2.load_weights(checkpoint_path)
I'm getting the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[<ipython-input-55-d2e3006b884f>](https://localhost:8080/#) in <cell line: 2>()
1 # Load model from checkpoint, that way we can fine-tune from the same stage the 10 percent data model was fine-tuned from
----> 2 model_2.load_weights(checkpoint_path) # revert model back to saved weights
1 frames
[/usr/local/lib/python3.9/dist-packages/tensorflow/python/ops/resource_variable_ops.py](https://localhost:8080/#) in _restore_from_tensors(self, restored_tensors)
718 self.handle, self.shape, restored_tensor)
719 except ValueError as e:
--> 720 raise ValueError(
721 f"Received incompatible tensor with shape {restored_tensor.shape} "
722 f"when attempting to restore variable with shape {self.shape} "
ValueError: Received incompatible tensor with shape (1280, 10) when attempting to restore variable with shape (1, 1, 1152, 48) and name Adam/m/block7a_se_reduce/kernel:0.
I tried to download the notebook from this repo, but have the same result.
Hi @ivanthecrazy,
Investigating this issue myself.
I'm going through the following resources:
- keras-team/tf-keras#442
- https://discuss.tensorflow.org/t/using-efficientnetb0-and-save-model-will-result-unable-to-serialize-2-0896919-2-1128857-2-1081853-to-json-unrecognized-type-class-tensorflow-python-framework-ops-eagertensor/12518/28
Looks like it's an issue with newer versions of TensorFlow and tf.keras.applications.efficientnet
models and using the load_weights()
method.
My current solution is installing TensorFlow 2.9.0 (as suggested by the links above) and running it from there.
For example:
# Install TensorFlow 2.9.0 to avoid issues (later versions may work)
# -U stands for "update" and "-q" stands for "quiet"
!pip install -U -q tensorflow==2.9.0
import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")
I will make sure this works and investigate it further if something is wrong.
I'll post another comment here once I've fixed the notebook: https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/05_transfer_learning_in_tensorflow_part_2_fine_tuning.ipynb
Update: I've confirmed that running notebook 05 works end-to-end with TensorFlow 2.9.0 (as per the links above).
Install TensorFlow 2.9.0 with:
# Install TensorFlow 2.9.0 to avoid issues (later versions may work)
# -U stands for "update" and "-q" stands for "quiet"
!pip install -U -q tensorflow==2.9.0
import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")
I'm not quite sure what's happening with later versions (e.g. 2.10.0+), the issues above seem to be long standing.
The notebook code has been updated to reflect installing TensorFlow 2.9.0 at the start.
See the updated code here and let me know how it goes: https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/05_transfer_learning_in_tensorflow_part_2_fine_tuning.ipynb
I had the same problem. I can verify that the new Daniel's notebook (tf version 2.9.0) works fine. Furthermore all these โModel failed to serialize as JSONโ warnings, while fitting the various models have been disappeared.
Hi @filipposkar , glad to hear you got it fixed!
Looks like this should also be fixed further in upcoming versions of TensorFlow (e.g. 2.13+).
For now, it looks like TensorFlow 2.9.0 works.
See this comment here: keras-team/tf-keras#383
Update: looks like TensorFlow 2.9.0 is still the most stable here, see: #553
TL;DR tried tf-nightly(2.14.0-dev20230520)
and it still broke.
Thank you @mrdbourke
i have issiue with changing verion of tensorflow, '!pip install -U -q tensorflow==2.9.0' doesn't work
Here is my temporary work around.
The contributing factor seems to be stemming from line of code at [29] from @mrdbourke 05_transfer_learning where the install of TF is upgrading to latest version of TF, however, some of the dependencies are deprecated in the latest version of TF since we are working with tensorflow 2.9.0.
- Removed import tensorflow as tf from block [29] of 05_transfer_learning github
- Scroll to top of your code to block [1] and use
- !pip uninstall -y tensorflow to remove the 2.12.x version
- Insert the tensorflow==2.9.0 install and import in block [2]
- !pip install -U -q tensorflow==2.9.0
- import tensorflow as tf
- print(tf.version)
- from tensorflow import keras
**(notes on protobuf below as the dependency is incompatible; however, the results have compiled the same as predicted.)
- I cleared all outputs and compiled the code from the beginning.
The Protobuf dependency used in TensorFlow is used to serialize and deserialize data. This means that it can be used to convert data from one format to another, such as from a Python object to a binary file. This is useful for TensorFlow because it allows models to be saved and loaded easily, and it also allows for communication between different TensorFlow components.
Specifically, Protobuf is used in TensorFlow for the following purposes:
-
Serializing and deserializing TensorFlow models: When a TensorFlow model is saved, it is serialized into a Protobuf file. This file can then be loaded back into TensorFlow to restore the model.
-
Communicating between different TensorFlow components: TensorFlow components, such as the TensorFlow Serving server and the TensorFlow Lite library, use Protobuf to communicate with each other. This allows them to exchange data in a format that is both efficient and easy to understand.
-
Providing a common data format for TensorFlow and other libraries: Protobuf is a widely used data format, so it can also be used to communicate with other libraries that use Protobuf. This makes it easier to integrate TensorFlow with other libraries, such as the gRPC RPC framework.
Overall, the Protobuf dependency used in TensorFlow is a valuable tool that allows TensorFlow models to be saved, loaded, and communicated with other components. It is a versatile data format that is widely used in the industry, and it makes TensorFlow more accessible to other libraries and frameworks. - Source: Bard
Your version of protobuf will most likely result in errors with tensorflow-datasets
It requires a much more recent version. The issue is that it requires a module called builder.py that's not present in version 3.19.x
The best workaround for that so far is to force reinstall protobuf=3.20.3 using pip install --force-reinstall "protobuf=3.20.3"
. Pip will complain about incompatibilities left and right but I've found it to work without issues so far with tf 2.9 to 2.12 with tensorflow-datasets and other libraries.
Here is my temporary work around.
The contributing factor seems to be stemming from line of code at [29] from @mrdbourke 05_transfer_learning where the install of TF is upgrading to latest version of TF, however, some of the dependencies are deprecated in the latest version of TF since we are working with tensorflow 2.9.0.
- Removed import tensorflow as tf from block [29] of 05_transfer_learning github
- Scroll to top of your code to block [1] and use
- !pip uninstall -y tensorflow to remove the 2.12.x version
- Insert the tensorflow==2.9.0 install and import in block [2]
- !pip install -U -q tensorflow==2.9.0
- import tensorflow as tf
- print(tf.version)
- from tensorflow import keras
**(notes on protobuf below as the dependency is incompatible; however, the results have compiled the same as predicted.)
- I cleared all outputs and compiled the code from the beginning.
The Protobuf dependency used in TensorFlow is used to serialize and deserialize data. This means that it can be used to convert data from one format to another, such as from a Python object to a binary file. This is useful for TensorFlow because it allows models to be saved and loaded easily, and it also allows for communication between different TensorFlow components.
Specifically, Protobuf is used in TensorFlow for the following purposes:
- Serializing and deserializing TensorFlow models: When a TensorFlow model is saved, it is serialized into a Protobuf file. This file can then be loaded back into TensorFlow to restore the model.
- Communicating between different TensorFlow components: TensorFlow components, such as the TensorFlow Serving server and the TensorFlow Lite library, use Protobuf to communicate with each other. This allows them to exchange data in a format that is both efficient and easy to understand.
- Providing a common data format for TensorFlow and other libraries: Protobuf is a widely used data format, so it can also be used to communicate with other libraries that use Protobuf. This makes it easier to integrate TensorFlow with other libraries, such as the gRPC RPC framework.
Overall, the Protobuf dependency used in TensorFlow is a valuable tool that allows TensorFlow models to be saved, loaded, and communicated with other components. It is a versatile data format that is widely used in the industry, and it makes TensorFlow more accessible to other libraries and frameworks. - Source: Bard
Hi @mrdbourke .
I run the line suggested,
!pip uninstall -y tensorflow
!pip install -U -q tensorflow==2.9.0
import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")
But it showed this.
Found existing installation: tensorflow 2.9.0
Uninstalling tensorflow-2.9.0:
Successfully uninstalled tensorflow-2.9.0
WARNING: Ignoring invalid distribution -ensorflow (/usr/local/lib/python3.10/dist-packages)
WARNING: Ignoring invalid distribution -ensorflow (/usr/local/lib/python3.10/dist-packages)
TensorFlow version: 2.12.0
Hi @mrdbourke .
I run the line suggested,
!pip uninstall -y tensorflow !pip install -U -q tensorflow==2.9.0 import tensorflow as tf print(f"TensorFlow version: {tf.__version__}")
But it showed this.
Found existing installation: tensorflow 2.9.0 Uninstalling tensorflow-2.9.0: Successfully uninstalled tensorflow-2.9.0 WARNING: Ignoring invalid distribution -ensorflow (/usr/local/lib/python3.10/dist-packages) WARNING: Ignoring invalid distribution -ensorflow (/usr/local/lib/python3.10/dist-packages) TensorFlow version: 2.12.0
Did you restart the runtime? Iirc tensorflow tells you it will only take effect after restarting it
Hi all,
After much troubleshooting, I've found the best fix for tf.keras.applications.EfficientNetB0
problems is to simply upgrade to tf.keras.applications.efficientnet_v2.EfficientNetV2B0
.
You can see a full write-up of the fix here: #575
It worked for me if i recompile the model before loading weights it may be because the model was training and it changed some layers and the tensor shape was no longer compatible
I got the same issue with trying to customize my model for Image Classification.
I noticed that it worked the first time but after I got this error.
After deleting the export model folder each time I do the training, it works, even with Tensorflow=2.11.0
Hi all,
After much troubleshooting, I've found the best fix for
tf.keras.applications.EfficientNetB0
problems is to simply upgrade totf.keras.applications.efficientnet_v2.EfficientNetV2B0
.You can see a full write-up of the fix here: keras-team/keras#575
I tried the solution here but it doesn't seem to work for me
Hi all,
After much troubleshooting, I've found the best fix fortf.keras.applications.EfficientNetB0
problems is to simply upgrade totf.keras.applications.efficientnet_v2.EfficientNetV2B0
.
You can see a full write-up of the fix here: keras-team/keras#575I tried the solution here but it doesn't seem to work for me
Oh dam!
What error are you getting now?
Did you try to reference the updated Notebook 05? See: https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/05_transfer_learning_in_tensorflow_part_2_fine_tuning.ipynb
I recompiled the model:
model_2.compile(loss='categorical_crossentropy',
optimizer=tf.keras.optimizers.Adam(),
metrics='accuracy')
and got rid of .ckpt from the checkpoint_path:
checkpoint_path = 'ten_percent_model_checkpoints_weights/checkpoint'
it just works perfectly fine now.
Using tf.keras.applications.efficientnet_v2.EfficientNetV2B0
didn't work for me, neither using other versions of tensorflow. It only works if I compile the model again before loading weights. If I leave the .ckpt
extension or not in the checkpoint path does not affect the result, I think.
getting the same error in 2024 as well "ValueError: Received incompatible tensor with shape (1280, 10) when attempting to restore variable with shape (1, 1, 1152, 48) and name Adam/m/block6h_se_reduce/kernel:0.", i tried downloading the 2.9 version but it doesnt work, any help @mrdbourke?
Actually this issue caused because model is recompiling between weights are saved and loaded.
in other words we are trying to load weights in slightly different model (with unlocked layers of base model).
quite obvious solution is - recreate model from scratch and load weights once again (and unlock layers once again if needed)