STMicroelectronics/stm32ai-modelzoo

training handposture

141391 opened this issue · 4 comments

141391 commented

Hi!
I got the following error when trying to train handposture, and I haven't been able to find a solution.
The data set I used is the compressed data set package in the original project, and basically no changes were made to the code.
The specific error reported is as follows:
Error executing job with overrides: [] Traceback (most recent call last): File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\hand_posture\scripts\training\train.py", line 45, in main train(configs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\hand_posture\scripts\utils\utils.py", line 143, in train history = augmented_model.fit(train_ds, validation_data=valid_ds, callbacks=callbacks, File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 552, in safe_patch_function patch_function.call(call_original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 170, in call return cls().__call__(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 181, in __call__ raise e File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 174, in __call__ return self._patch_implementation(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 232, in _patch_implementation result = super()._patch_implementation(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\tensorflow\__init__.py", line 1255, in _patch_implementation history = original(inst, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 535, in call_original return call_original_fn_with_event_logging(_original_fn, og_args, og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 470, in call_original_fn_with_event_logging original_fn_result = original_fn(*og_args, **og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 532, in _original_fn original_result = original(*_og_args, **_og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, **tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:**
The error location code is as follows:
print("[INFO] : Starting training...") history = augmented_model.fit(train_ds, validation_data=valid_ds, callbacks=callbacks, epochs=cfg.train_parameters.training_epochs)

The relevant configuration is as follows:
train_parameters: batch_size: 32 training_epochs: 1000 optimizer: Adam initial_learning: 0.01 learning_rate_scheduler: Constant
model: model_type: {name : CNN2D_ST_HandPosture, version: v1} input_shape: [8, 8, 2] dropout: 0.2

LFOSTM commented

Hello,
We will have a look at it soon.
Do you reproduce the issue without doing any change in the code?

141391 commented

Hello, We will have a look at it soon. Do you reproduce the issue without doing any change in the code?

Hello
I haven't made any changes to the code. I just configured the relevant environment and changed the file path of cubeAI. Unfortunately, I haven't solved this aspect yet.

141391 commented

Hello, We will have a look at it soon. Do you reproduce the issue without doing any change in the code?

Hello!
I have solved this problem. The reason why I have this problem is that my protobuf version is too high. I originally lowered the version to 3.20.1 with reference, but it didn't seem to work. I just downgraded the version to 3.19.0. The model is finally ready to train! ! !
Thank you very much for your reply!
I will continue to try to deploy to the hardware.
Thank you!

LFOSTM commented

Great!
Don't hesitate to share the outcome of your training and deployment.
I will close the issue in a couple of days.