yingkaisha/keras-unet-collection

transunet_2d

fatihnurcin opened this issue ยท 15 comments

Can't save the model and I get;
NotImplementedError: Layer patch_extract has arguments in __init__ and therefore must override get_config.

I've fixed some model saving issues in keras-unet-collection==0.1.10, but for this one, have you tried model.save(filepath, save_traces=True)?

Error related to saving of the model occurred during the training;
Error didn't allow training to continue;

model fit was set as follows;
checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath, verbose=1, save_best_only=True, mode='min') history=model.fit( train_generator, steps_per_epoch = 2800/batch_size, epochs=epochs, validation_data=val_generator, validation_steps = 600/batch_size, callbacks=callbacks_list, verbose=1 )

I tried the new version. Now, training doesn't stop
However, it gives the following warning:

/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/generic_utils.py:497: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
category=CustomMaskWarning)

I do not have a solid understanding of this warming msg. A possible cause is that transunet_2d contains customized layer class. If it doesn't impact performance, then maybe leave it there for now.

For loading the pre-trained model, load weights and place them in a new model that has the same architecture would always work.

for the base model
I can't load the model with lowest validation error to test accuracy on testing data

For the following code;

model2=tf.keras.models.load_model('model_epoch24.h5', custom_objects={"jaccard_coef": jaccard_coef,"jaccard_coef_int":jaccard_coef_int})

I get this error

ValueError: Unknown layer: patch_extract. Please ensure this object is passed to the custom_objects argument. See https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details.

If you have save_traces=False:

from keras_unet_collection.transformer_layers import patch_extract, patch_embedding
from keras_unet_collection.activations import GELU

model = keras.models.load_model('model_epoch24.h5', 
                                custom_objects={"jaccard_coef": jaccard_coef,
                                                "jaccard_coef_int":jaccard_coef_int,
                                                "GELU":GELU,
                                                "patch_extract":patch_extract, 
                                                "patch_embedding":patch_embedding,
                                                })

Thank you for the response

I solved it loading the weights not the model but Thanks a lot, I will try that one as well.

One final question, I am supposed to use transunet with 12 heads and 12 transformers as it was implemented in the original paper
However, I am not able to do that, I can use maximum 3 heads and 3 transformers with 240x240x3 images with batch size 16
I converted images to grayscale and reshaped to 224x224 with batch size 16 (as it was implemented in the paper) and still wasn't able to use 12 heads and 12 transformers.

In the original paper they had GPU with 12gb ram and I have google colab pro which provides GPU with 16gb ram

It seems that there could be an issue about memory originated from model implementation but I am not sure.
In what conditions are you able to run the model? what GPU ram and Ram do you have?

Thanks a lot

I'm running on NVIDIA Tesla V100 (32 GB).

Training on single samples (train_on_batch) can reduce memory cost. This training setup would fluctuate more but it worked on my problems with learning rate decay.

In original TransUNet implementation, patch size is 16x16, in this implementation it is 1x1. That might be why, it is not working with 12 number of heads and 12 layers while having batch size 16.
Is there any chance you add option to vary patch size please?
Thank you very much, I appreciate all the feedback

I will add it in the coming days, but there is a concern:

The transformer blocks take encoded tensors as inputs. Given patch sizes of 16-by-16, the smallest tensor of the TransUNET must be at least 32-by-32. That said, varying/increasing patch size requires some internal checks.

The original TransUNET implementation was proposed for 512-by-512 inputs, it produces sufficiently large encoded tensors for a large patch size.

I will add it in the coming days, but there is a concern:

The transformer blocks take encoded tensors as inputs. Given patch sizes of 16-by-16, the smallest tensor of the TransUNET must be at least 32-by-32. That said, varying/increasing patch size requires some internal checks.

The original TransUNET implementation was proposed for 512-by-512 inputs, it produces sufficiently large encoded tensors for a large patch size.

Any estimation when is it going to be available?

I will add it in the coming days, but there is a concern:
The transformer blocks take encoded tensors as inputs. Given patch sizes of 16-by-16, the smallest tensor of the TransUNET must be at least 32-by-32. That said, varying/increasing patch size requires some internal checks.
The original TransUNET implementation was proposed for 512-by-512 inputs, it produces sufficiently large encoded tensors for a large patch size.

Any estimation when is it going to be available?

Hi @fatihnurcin , I emailed the authors of the transunet for some of their network design issues. I will update this package based on their feedback. (I will resolve them myself if my inquires were ignored).

In original TransUNet implementation, patch size is 16x16, in this implementation it is 1x1. That might be why, it is not working with 12 number of heads and 12 layers while having batch size 16.
Is there any chance you add option to vary patch size please?
Thank you very much, I appreciate all the feedback
transunet

@fatihnurcin Based on the response of transunet authors, 1-by-1 patch size should be used. Details see the image above.

(Some comments are deleted to clear up the timeline)

Thank you so much clarification. Only remaining issue is that, model is too large compared to original one that on google colab pro I am allowed use 3 number of heads and layers with batch size of 16. According to paper at least I am suppose to use batch size of 16 with 12 number of heads and 12 layers.
And one difference compared to original paper is lack of regularization which is "dropout"
Regardless, thank you so much for the effort and help. I really appreciate it. All the best

Thank you for the response

I solved it loading the weights not the model but Thanks a lot, I will try that one as well.

One final question, I am supposed to use transunet with 12 heads and 12 transformers as it was implemented in the original paper However, I am not able to do that, I can use maximum 3 heads and 3 transformers with 240x240x3 images with batch size 16 I converted images to grayscale and reshaped to 224x224 with batch size 16 (as it was implemented in the paper) and still wasn't able to use 12 heads and 12 transformers.

In the original paper they had GPU with 12gb ram and I have google colab pro which provides GPU with 16gb ram

It seems that there could be an issue about memory originated from model implementation but I am not sure. In what conditions are you able to run the model? what GPU ram and Ram do you have?

Thanks a lot

Hi, I am having the same problem with the memory issue. @yingkaisha Do you have any timeline of when you can address this?
Thanks