Getting bad TCN performance

Question

Getting bad TCN performance

bjourne opened this issue 4 years ago · 4 comments

Hi. I'm using your TCN for sequence prediction and I'm getting quite bad results in comparison to an LSTM. Maybe I'm doing something wrong or maybe there is a bug?

I'm comparing the units using:

from tcn import TCN
from tensorflow.keras import *
from tensorflow.keras.layers import *
from tensorflow.keras.optimizers import *
import numpy as np

SEQ_LEN = 64
text = open('shakespeare.txt').read()
ch2ix = {ch : i for i, ch in enumerate(sorted(set(text)))}
X = np.array([[ch2ix[ch] for ch in text[i - SEQ_LEN:i]]
              for i in range(SEQ_LEN, len(text))])
Y = np.array([ch2ix[text[i]] for i in range(SEQ_LEN, len(text))])

model = Sequential([
    Embedding(len(ch2ix), 32, input_shape = (SEQ_LEN,)),
    LSTM(128),
    # TCN(nb_filters = 64,
    #     kernel_size = 2,
    #     nb_stacks = 2,
    #     dilations = [1, 2, 4, 8]),
    Dense(len(ch2ix), activation = 'softmax')])
opt = RMSprop(learning_rate = 0.005)
model.compile(
    optimizer = opt,
    loss = 'sparse_categorical_crossentropy',
    metrics = ['sparse_categorical_accuracy'])
model.fit(X, Y, epochs = 10, verbose = 1, batch_size = 32)

The number of parameters for the models are 113k for the TCN and 93k for the LSTM. After one epoch which takes the LSTM 216 seconds the loss is 1.81. For the TCN it is 2.31 and takes 339 seconds. "shakespeare.txt" is 1.1m text file containing Shakespeare plays.

Answer 1 · 2020-06-09T00:30:36.000Z

@bjourne can you run it a bit more than just one epoch?
I guess in terms of comparison you should use nb_stacks=1, nb_filters=128 and a higher kernel_size such as 5 or 7. For dilations you can adjust them to have the same number of parameters as your LSTM. But remember that LSTM have a memory of around 50-100 steps. So make sure your TCN has a receptive field at least equal to this.

Answer 2 · 2020-09-11T17:44:42.000Z

Also in my cases studies WaveNet is better ...

Answer 3 · 2021-02-16T09:06:27.000Z

I am revisiting this issue.

On the smaller version of the shakespeare.txt dataset, I could get:

TCN: 48.7% accuracy on the test set.
LSTM: 49.5% accuracy on the test set.

Also the speed of the TCN is much faster. I think it took 17s for a LSTM and 26s for a TCN.

TCN are of course optimized for very long sequences.

So the gap observed before is not present anymore.

Here is the code:

import numpy as np
from tcn import TCN, tcn_full_summary
from tensorflow.keras import Model
from tensorflow.keras.layers import Embedding, LSTM, Dense, Input
from tensorflow.keras.optimizers import Adam
import random as python_random
import tensorflow as tf

SEQ_LEN = 64
text = open('shakespeare.txt').read()
ch2ix = {ch: i for i, ch in enumerate(sorted(set(text)))}
X = np.array([[ch2ix[ch] for ch in text[i - SEQ_LEN:i]]
              for i in range(SEQ_LEN, len(text))])
Y = np.array([ch2ix[text[i]] for i in range(SEQ_LEN, len(text))])

# mostly the same performance.
use_lstm = False

# could reach 48.7%. with:
# nb_filters=64
# dilations=(1, 2, 4, 8, 16, 32),
# kernel_size=2,
# use_skip_connections=True,
# use_layer_norm=True
# lstm could reach 49.5%.

tcn = TCN(nb_filters=64, dilations=(1, 2, 4, 8, 16, 32), kernel_size=2, use_skip_connections=True, use_layer_norm=True)
print(tcn.receptive_field)

i = Input(shape=(SEQ_LEN,))
x = Embedding(len(ch2ix), 32)(i)
x = LSTM(128)(x) if use_lstm else tcn(x)
x = Dense(len(ch2ix), activation='softmax')(x)
m = Model(inputs=[i], outputs=[x])

tcn_full_summary(m)

m.compile(
    optimizer=Adam(lr=0.005),
    loss='sparse_categorical_crossentropy',
    metrics=['sparse_categorical_accuracy'])

np.random.seed(123)
python_random.seed(123)
tf.random.set_seed(1234)

m.fit(X, Y, epochs=20, batch_size=32, validation_split=0.2)

Answer 4 · 2021-02-16T09:39:56.000Z

I'll close this as the recent PRs have fixed most of the problems described in this PR. The LSTM still outperforms the TCN here (on the shakespeare.txt - 92kb) because the problem is more suitable for LSTM (char[t+1] heavily depends on char[t] and char[t-1] and not much 200 steps behind).