keras-team/keras-io

model not converging the sparse categorical accuracy stays same for the whole epochs.

Opened this issue · 6 comments

while comparing with the results provided in https://keras.io/examples/timeseries/timeseries_classification_transformer/, the dense layer is different.

dense_2 (Dense)             (None, 128)                  256       ['global_average_pooling1d_2[0
                                                                    ][0]']                        
                                                                                                  
 dropout_25 (Dropout)        (None, 128)                  0         ['dense_2[0][0]']  

how to fix it?

Same experience, the model training stops because of the EarlyStopping callback.
Setting callbacks = None results in 150 EPOCHS of zero progress in accuracy.

Also reproducible in Colab

I think I might have found the solution. The GlobalAveragePooling1D just before the dense layer returns the wrong shape.
It returns (None, 1) with input (None, 500, 1), but the result in the example page is (None, 500)
Changing the parameter data_format into channels_first like below fixes that:
x = layers.GlobalAveragePooling1D(data_format="channels_first", name="TooSmallOutput")(x)

With that change, the model does converge.

DISCLAIMER: I am new to transformers, only played with LSTM's so far.
From the tensorflow spec:
"channels_last" corresponds to inputs with shape (batch, steps, features) while "channels_first" corresponds to inputs with shape (batch, features, steps)

If 500 is the number of steps ( the sequence length?) and there is only one feature in the dataset, channels_last makes sense, so I don't understand why channel_first works and channel_last does not.

Without " name="TooSmallOutput" it is also working.
Regarding TNN, I am also new to the machine learning and am currently trying some examples.

Ah I forgot to remove that.
I added the "name="TooSmallOutput" to debug it.
It just prints out the name in the model summary.

Thanks for the reply

For multiclass classification using time series data, the accuracy comes nearly 0.009. What are the factors that can improve accuracy?