model not converging the sparse categorical accuracy stays same for the whole epochs.

Question

model not converging the sparse categorical accuracy stays same for the whole epochs.

Opened this issue 2 months ago · 6 comments

while comparing with the results provided in https://keras.io/examples/timeseries/timeseries_classification_transformer/, the dense layer is different.

dense_2 (Dense)             (None, 128)                  256       ['global_average_pooling1d_2[0
                                                                    ][0]']                        
                                                                                                  
 dropout_25 (Dropout)        (None, 128)                  0         ['dense_2[0][0]']

how to fix it?

Answer 1 · 2024-04-01T18:09:14.000Z

Same experience, the model training stops because of the EarlyStopping callback.
Setting callbacks = None results in 150 EPOCHS of zero progress in accuracy.

Also reproducible in Colab

Answer 2 · 2024-04-01T20:37:11.000Z

I think I might have found the solution. The GlobalAveragePooling1D just before the dense layer returns the wrong shape.
It returns (None, 1) with input (None, 500, 1), but the result in the example page is (None, 500)
Changing the parameter data_format into channels_first like below fixes that:
x = layers.GlobalAveragePooling1D(data_format="channels_first", name="TooSmallOutput")(x)

With that change, the model does converge.

DISCLAIMER: I am new to transformers, only played with LSTM's so far.
From the tensorflow spec:
"channels_last" corresponds to inputs with shape (batch, steps, features) while "channels_first" corresponds to inputs with shape (batch, features, steps)

If 500 is the number of steps ( the sequence length?) and there is only one feature in the dataset, channels_last makes sense, so I don't understand why channel_first works and channel_last does not.

Answer 3 · 2024-04-02T09:50:51.000Z

Without " name="TooSmallOutput" it is also working.
Regarding TNN, I am also new to the machine learning and am currently trying some examples.

Answer 4 · 2024-04-02T10:03:36.000Z

Ah I forgot to remove that.
I added the "name="TooSmallOutput" to debug it.
It just prints out the name in the model summary.

Answer 5 · 2024-04-02T10:09:49.000Z

Thanks for the reply

Answer 6 · 2024-04-02T10:12:05.000Z

For multiclass classification using time series data, the accuracy comes nearly 0.009. What are the factors that can improve accuracy?