philipperemy/keras-tcn

tcn not converging for neural data prediction

catubc opened this issue · 6 comments

Describe the bug
Hello

This is not a bug, it's more of a question re: use. Sorry for using the "bug" report, but I wasn't what the correct field was.

I'm trying to predict a behavior label (0=no behavior; 1=behavior) from temporal neural data and am having a hard time getting the model to converge. My data has shape [n_samples, n_timesteps, n_dimensions], where n_dimensions can be from 1-10 (these are Principal Components so even n_dimensions = 1 captures a lot of the structure in the data).

Linear classifiers do relatively well in predicting an upcoming behavior as far back as 5-10 seconds prior to behavior and vanilla LSTMs also have increasingly better prediction closer to behavior. So I know there is structure in the data that can be learned, and was hoping that some form of temporal convolution would recover the structure better than these other methods.

I'm setting up the model like this:

# 
batch_size, time_steps, input_dim = None, 450, 6

# 
tcn_layer = TCN(input_shape=(time_steps, input_dim),
                return_sequences=True,  # I need this as I want to see the development of the prediction over time
                dilations = (1, 2, 4, 8, 16, 32,64,128,256,512))

# 
m = Sequential([
    tcn_layer,
    Dense(1, activation='sigmoid')
])

And after compiling the model I get this:

layers[i]: <tcn.tcn.TCN object at 0x7fa514299670>
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
tcn (TCN)                    (None, 450, 64)           236352    
_________________________________________________________________
dense (Dense)                (None, 450, 1)            65        
=================================================================
Total params: 236,417
Trainable params: 236,417
Non-trainable params: 0

The model gets over trained after a few hundred epochs (i.e. does ok on training data), but the val_loss is still very large and it performs at chance or worse on test data (I also attach an image of the average prediction at each time point):

Epoch 272/300
5/5 [==============================] - 0s 39ms/step - loss: 0.0589 - val_loss: 0.4058
Epoch 273/300
5/5 [==============================] - 0s 43ms/step - loss: 0.0588 - val_loss: 0.4056
Epoch 274/300
5/5 [==============================] - 0s 43ms/step - loss: 0.0588 - val_loss: 0.4055
Epoch 275/300
5/5 [==============================] - 0s 44ms/step - loss: 0.0588 - val_loss: 0.4054

Any advice on perhaps tweaking the TCN input params to improve on prediction?
results

You can refer to the guide here: https://github.com/philipperemy/keras-tcn#how-do-i-choose-the-correct-set-of-parameters-to-configure-my-tcn-layer.

Try to print your receptive field with:

print('Receptive field size =', tcn_layer.receptive_field)

Also print a summary of the TCN and the LSTM and try to have roughly the same number of weights. That will give you an idea if your model is overly parameterized.

And just match it with 450 (aim a bit higher). In your case, I think your RF is very high hence a very unstable training with the amount of data you have.

Thanks so much Philip. I started tweaking some of the parameters, and the training set prediction looks much better. There is much higher variance early (e.g. t=-15sec) than at t=0, so this asymptotic shape is what we'd like to see.

But I'm still getting overfitting (or better yet, lacking generalization) on the test sets. One thing I should mention is that we have very few samples (usually 50-100) and I'm worried that overfitting will be difficult to avoid.

results

I will keep tweaking, but couple of quick questions that would help along as there are many params:

  1. Our time steps are 450, and RF 1,191. Is that still too much?

  2. Also, it was not completely clear what LSTM you were referring to? For now, I'm only using a tcn + dense layer. Are you suggesting we add an LSTM also? I tested this before, and it wasn't clearly helping.

  3. Any other obvious tweaks?

Here's the initialization code:

# if time_steps > tcn_layer.receptive_field, then we should not
# be able to solve this task.
batch_size, time_steps, input_dim = None, 450, 6

# 
dilations = (1, 4, 16, 64)

# 
tcn_layer = TCN(nb_filters = 64,
                    kernel_size=8,
                    input_shape=(time_steps, input_dim),
                    return_sequences=True,
                    dilations = dilations,
                    dropout_rate=0.5,
                    use_batch_norm=True,
                    use_layer_norm=False,
                    use_weight_norm=False             
               )

And the model params:

Receptive field size = 1191
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
tcn_1 (TCN)                  (None, 450, 64)           235456    
_________________________________________________________________
dense_1 (Dense)              (None, 450, 1)            65        
=================================================================
Total params: 235,521
Trainable params: 234,497
Non-trainable params: 1,024

@catubc I think it's going to be difficult to apply TCN without overfitting on your dataset. The time series are very long (465) and you have just 50/100 of them. It's quite challenging!

Our time steps are 450, and RF 1,191. Is that still too much?

  • Not too much! As long as your RF is greater than your sequence length, the TCN will see the whole sequence. Although you could lower it to something like 512 or just above 450 (as an effort to reduce the complexity of the model).

Also, it was not completely clear what LSTM you were referring to? For now, I'm only using a tcn + dense layer. Are you suggesting we add an LSTM also? I tested this before, and it wasn't clearly helping.

  • Just swap the TCN layer with a LSTM layer to compare (in terms of number of parameters and training). Do not use TCN and LSTM in the same model ;)

Any other obvious tweaks?

Hi Philip
Thank you again for taking the time to respond. It seems there are some parameters that work better than others. For example, dropout rate seems to have a pretty narrow range (e.g. 0.05-0.1) for some parameter pairs. It also seems like the smaller models don't do well (even on test data), so higher depth/n_params seems to do better (but I've not exhaused the options).

tcn__nb_filters128_kernel_size32_dilations128_skip connectFalse_dropout rate0 05_latest

I will try to add regularization - but it seems your TCN module does not inherit the Keras modules, so I guess I can only apply it to the dense layer. But that doesn't sound optimal, I would think we need to constrain the params in the tcn layer, not the dense.

Thanks again for the help!

I get the same problem but maybe for a different reason:
#204

Good to know. For my data the last plot above is actually not correct (I accidentally trained on test data due to randomizing step), so I'm back to square 1 about finding params that work...