Bug: While using model_checkpoint_callback from keras every +- 2000 epochs an error occurs.

Question

Bug: While using model_checkpoint_callback from keras every +- 2000 epochs an error occurs.

AWbosman opened this issue a year ago · 2 comments

While using model_checkpoint_callback from keras every +- 2000 epochs an error occurs.

Bug Description

<---
Traceback (most recent call last):
File "./test_program/main_keras.py", line 41, in
auto_model.fit(x_train[:100], y_train[:100], callbacks = [model_checkpoint_callback])
File "path/VERONA/.venv/lib/python3.7/site-packages/autokeras/auto_model.py", line 299, in fit
**kwargs
File "path/VERONA/.venv/lib/python3.7/site-packages/autokeras/engine/tuner.py", line 194, in search
epochs=epochs, callbacks=new_callbacks, verbose=verbose, **fit_kwargs
File "path/VERONA/.venv/lib/python3.7/site-packages/keras_tuner/engine/base_tuner.py", line 183, in search
results = self.run_trial(trial, *fit_args, **fit_kwargs)
File "path/VERONA/.venv/lib/python3.7/site-packages/keras_tuner/engine/tuner.py", line 295, in run_trial
obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
File "path/VERONA/.venv/lib/python3.7/site-packages/autokeras/engine/tuner.py", line 102, in _build_and_fit_model
model, self.hypermodel.batch_size, **kwargs
File "path/VERONA/.venv/lib/python3.7/site-packages/autokeras/utils/utils.py", line 89, in fit_with_adaptive_batch_size
batch_size, lambda **kwargs: model.fit(**kwargs), **fit_kwargs
File "path/VERONA/.venv/lib/python3.7/site-packages/autokeras/utils/utils.py", line 101, in run_with_adaptive_batch_size
history = func(x=x, validation_data=validation_data, **fit_kwargs)
File "path/VERONA/.venv/lib/python3.7/site-packages/autokeras/utils/utils.py", line 89, in
batch_size, lambda **kwargs: model.fit(**kwargs), **fit_kwargs
File "path/VERONA/.venv/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "path/VERONA/.venv/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:

Detected at node 'Cast_5' defined at (most recent call last):
File "./test_program/main_keras.py", line 41, in
auto_model.fit(x_train[:100], y_train[:100], callbacks = [model_checkpoint_callback])
File "path/VERONA/.venv/lib/python3.7/site-packages/autokeras/auto_model.py", line 299, in fit
**kwargs
File "path/VERONA/.venv/lib/python3.7/site-packages/autokeras/engine/tuner.py", line 194, in search
epochs=epochs, callbacks=new_callbacks, verbose=verbose, **fit_kwargs
File "path/VERONA/.venv/lib/python3.7/site-packages/keras_tuner/engine/base_tuner.py", line 183, in search
results = self.run_trial(trial, *fit_args, **fit_kwargs)
File "path/VERONA/.venv/lib/python3.7/site-packages/keras_tuner/engine/tuner.py", line 295, in run_trial
obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
File "path/VERONA/.venv/lib/python3.7/site-packages/autokeras/engine/tuner.py", line 102, in _build_and_fit_model
model, self.hypermodel.batch_size, **kwargs
File "path/VERONA/.venv/lib/python3.7/site-packages/autokeras/utils/utils.py", line 89, in fit_with_adaptive_batch_size
batch_size, lambda **kwargs: model.fit(**kwargs), **fit_kwargs
File "path/VERONA/.venv/lib/python3.7/site-packages/autokeras/utils/utils.py", line 101, in run_with_adaptive_batch_size
history = func(x=x, validation_data=validation_data, **fit_kwargs)
File "path/VERONA/.venv/lib/python3.7/site-packages/autokeras/utils/utils.py", line 89, in
batch_size, lambda **kwargs: model.fit(**kwargs), **fit_kwargs
File "path/VERONA/.venv/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "path/VERONA/.venv/lib/python3.7/site-packages/keras/engine/training.py", line 1650, in fit
tmp_logs = self.train_function(iterator)
File "path/VERONA/.venv/lib/python3.7/site-packages/keras/engine/training.py", line 1249, in train_function
return step_function(self, iterator)
File "path/VERONA/.venv/lib/python3.7/site-packages/keras/engine/training.py", line 1233, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "path/VERONA/.venv/lib/python3.7/site-packages/keras/engine/training.py", line 1222, in run_step
outputs = model.train_step(data)
File "path/VERONA/.venv/lib/python3.7/site-packages/keras/engine/training.py", line 1027, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "path/VERONA/.venv/lib/python3.7/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
self.apply_gradients(grads_and_vars)
File "path/VERONA/.venv/lib/python3.7/site-packages/autokeras/keras_layers.py", line 363, in apply_gradients
experimental_aggregate_gradients=experimental_aggregate_gradients,
File "path/VERONA/.venv/lib/python3.7/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
return super().apply_gradients(grads_and_vars, name=name)
File "path/VERONA/.venv/lib/python3.7/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 632, in apply_gradients
self._apply_weight_decay(trainable_variables)
File "path/VERONA/.venv/lib/python3.7/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1162, in _apply_weight_decay
variables,
File "path/VERONA/.venv/lib/python3.7/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1156, in distributed_apply_weight_decay
variable, weight_decay_fn, group=False
File "path/VERONA/.venv/lib/python3.7/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1151, in weight_decay_fn
wd = tf.cast(self.weight_decay, variable.dtype)
Node: 'Cast_5'
Cast string to float is not supported
[[{{node Cast_5}}]] [Op:__inference_train_function_1035208]

-->

Bug Reproduction

Code for reproducing the bug:
import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import mnist

from datetime import datetime
import autokeras as ak

(x_train, y_train), (x_test, y_test) = mnist.load_data()
input_node = ak.ImageInput()
output_node = ak.DenseBlock()(input_node)
output_node = ak.ClassificationHead()(output_node)
auto_model = ak.AutoModel(
inputs=input_node, outputs=output_node, overwrite=True)
checkpoint_filepath = 'test_networks/checkpoint_{epoch}'

model_checkpoint_callback = ModelCheckpoint(filepath = checkpoint_filepath, save_weights_only = False, save_best_only =False, monitor='val_accuracy')

auto_model.fit(x_train[:100], y_train[:100], callbacks = [model_checkpoint_callback])

Data used by the code:

Expected Behavior

Setup Details

Include the details about the versions of:

OS type and version:
Python:
autokeras:
keras-tuner:
scikit-learn:
numpy:
pandas:
tensorflow:

Additional context

Answer 1 · 2023-01-09T13:22:48.000Z

I'm having the same issue, but only when I use validation data

Answer 2 · 2023-01-09T13:45:42.000Z

After talking with some of my colleagues, the fix for this could be downgrading tensorflow to version 2.10.1. This has been adressed as a bug earlier as well so I will close the issue.