philipperemy/keras-tcn

Callback issue with Colab

clearly-outsane opened this issue · 13 comments

I'm using the example code under API (an example for regression is given) in the README and the only change I am doing is adding a callback like so :

model.fit(x_train, y_train, validation_split=0.2, epochs=300,callbacks=[tensorboard_callback])

I keep getting this error:
image

I'm not sure if this is a tensorflow problem or a problem with the library but I'm only getting this error when I use the TCN from this repo

@clearly-outsane can you update your TensorFlow version? Or maybe try to downgrade it to 2.3.0?

I'm kinda struggling to downgrade minor version of tensorflow as I'm using colab for this and it's not as simple as pip install. The version I'm running right now is 2.4.1. Could you please advise me some other solution or atleast a way to debug this?

@clearly-outsane are you trying to feed something with bool as a type? Can you make sure that all your x_train and y_train are of type float? If not, you can cast them with np.array(x_train, dtype=float)

Tensorflow 2.4.1 does not seem to have any problem. I re-ran the test suite with tensorflow 2.4.1 and all went smooth: https://github.com/philipperemy/keras-tcn/runs/1908410040?check_suite_focus=true#step:4:8.

It runs perfectly as long as I don't add a callback. It has me stumped as well.
So I decided to make a sample colab notebook with the MNIST example given in your repo just to show you. The only thing I have changed is to add a tensorboard callback.

The link to it is here. Please do take a look at it and I appreciate any help !

Problem happens with 2.3.0 too. I have no idea why. Is it a tensorboard thing?

No I tried with various other tensorflow callbacks as well - such as early stopping, saving weights etc. They all give the same error. When I google the error there was nothing concrete to help me out either : ( Essentially there is a boolean somewhere and it shouldn't be there but I have no idea where.

Basically any type of callbacks in model.fit don't work.

Edit - Let me know if there's any way I can help because being able to use callbacks makes life much easier and I think this repo is the best implementation for TCNs in tensorflow!

@clearly-outsane happy to hear! Also some good news. Your code works well when it's not running on Collab. The callback works. I also tried EarlyStopping and it worked too. Seems to be a problem related to Collab.

Oh !
Thank you for finding that out! Now I'm even more confused though- why does it happen only on colab.

Okay I'm gonna try using a local runtime on colab and try to host it on some EC2 instance and see if that works. Do you happen to know any place where we can report issues with colab?

@clearly-outsane not sure why but I've seen other problems that were only related to collab.
Let me know if you find a way to make it work :)

So I found this open issue:
tensorflow/tensorflow#43200

And with this new found knowledge I removed weight_norm from the TCN layer and callbacks worked as intended with no errors. I'm still not entirely sure why it errors out with tf.bool layers as I've not seen the source code but yeah that's a quick update on what I've found.

So it seems to have been fixed with Tensorflow 2.7.

tensorflow/tensorflow#43200 (comment)

I'll close this issue but feel free to re-open if it still does not work.

thanks for the update :) I'd almost forgotten about it