RNN not converging
joetigger opened this issue · 6 comments
I'm new to theanets and found it quite different from keras. I tried a trivial example to predict a time series {0,1,2,3,4, 0,1,2,3,4, ...}
It worked very well with keras:
import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense,Activation,Dropout
from keras.layers.recurrent import GRU
def prepare(data, steps=4, split=0.15):
X, Y = [], []
for i in range(0, data.shape[0]-steps):
X.append(data[i:i+steps,:])
Y.append(data[i+steps,:])
ntrn = int(len(X) * (1 - split))
X_train, Y_train = np.array(X[:ntrn]), np.array(Y[:ntrn])
X_test, Y_test = np.array(X[ntrn:]), np.array(Y[ntrn:])
return (X_train, Y_train), (X_test, Y_test)
np.random.seed(0)
data = np.arange(5).reshape((5,1))
for i in xrange(10):
data = np.append(data, data, axis=0)
(X_train, y_train), (X_test, y_test) = prepare(data)
in_out_neurons = 1
hidden_neurons = 10
model = Sequential()
model.add(GRU(hidden_neurons, input_dim=in_out_neurons, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(in_out_neurons))
model.add(Activation("linear"))
model.compile(loss="mean_squared_error", optimizer="rmsprop")
print "Model compiled."
model.fit(X_train, y_train, batch_size=10, nb_epoch=10, validation_split=0.1)
predicted = model.predict(X_test)
print np.sqrt(((predicted - y_test) ** 2).mean(axis=0)).mean()
print predicted
but not with theanets:
import numpy as np
import theanets
def prepare(data, steps=4, split=0.15):
X, Y = [], []
for i in range(0, data.shape[0]-steps):
X.append(data[i:i+steps,:])
Y.append(data[i+1:i+1+steps,:])
ntrn = int(len(X) * (1 - split))
X_train, Y_train = np.array(X[:ntrn]), np.array(Y[:ntrn])
X_test, Y_test = np.array(X[ntrn:]), np.array(Y[ntrn:])
return (X_train, Y_train), (X_test, Y_test)
np.random.seed(0)
data = np.arange(5).reshape((5,1))
for i in xrange(10):
data = np.append(data, data, axis=0)
(X_train, y_train), (X_test, y_test) = prepare(data)
in_out_neurons = 1
hidden_neurons = 10
net = theanets.recurrent.Regressor((in_out_neurons, dict(size=hidden_neurons, form='gru'), in_out_neurons))
net.train([X_train,y_train], [X_test,y_test], hidden_dropout=0.2, algo='rmsprop')
predicted = net.predict(X_test)
print np.sqrt(((predicted - y_test) ** 2).mean(axis=0)).mean()
print predicted
How do I tell RNN to ignore the first n outputs (i.e. when the RNN is ramping up)? This is done automatically in keras (X_train[x,0:t,:] => y_train[x,t,:]) but theanets expects X_train and y_train to have the same number of time steps.
You can tell theanets to ignore some of the target outputs by creating a weighted model and passing an additional array of weights during training. Create your model using
net = theanets.recurrent.Regressor((in_out_neurons, dict(size=hidden_neurons, form='gru'), in_out_neurons), weighted=True)
and then provide a third array in your training / validation sets that gives a 1 for values to retain and a 0 for values to ignore.
See examples/lstm-chime.py for an example.
I tried using "weighted=True" but it still didn't converge as well as keras. Here's the updated code:
import numpy as np
import theanets
def prepare(data, steps=4, split=0.15):
X, Y = [], []
for i in range(0, data.shape[0]-steps):
X.append(data[i:i+steps,:])
Y.append(data[i+1:i+1+steps,:])
ntrn = int(len(X) * (1 - split))
X_train, Y_train = np.array(X[:ntrn]), np.array(Y[:ntrn])
X_test, Y_test = np.array(X[ntrn:]), np.array(Y[ntrn:])
return (X_train, Y_train), (X_test, Y_test)
np.random.seed(0)
data = np.arange(5).reshape((5,1))
for i in xrange(10):
data = np.append(data, data, axis=0)
(X_train, y_train), (X_test, y_test) = prepare(data)
mask_train = np.ones_like(y_train)
mask_train[:,:3,:] = 0
mask_test = np.ones_like(y_test)
mask_test[:,:3,:] = 0
in_out_neurons = 1
hidden_neurons = 10
net = theanets.recurrent.Regressor((in_out_neurons, dict(size=hidden_neurons, form='gru'), in_out_neurons), weighted=True)
net.train([X_train,y_train,mask_train], [X_test,y_test,mask_test], hidden_dropout=0.2, algo='rmsprop')
predicted = net.predict(X_test)
print np.sqrt(((predicted - y_test) ** 2).mean(axis=0)).mean()
print predicted
Can't figure out where I did wrong. So any help is appreciated.
This mostly looks correct from a theanets usage perspective. But it looks like you're assigning the target output for example i
to example i+1
in prepare
?
Y.append(data[i+1:i+1+steps, :])
Shouldn't this be data[i:i+1+steps, :]
?
Thanks for helping me review the code. I want to predict the next value in a time series, hence i+1 as i is input in current time step.
Yes but you have two i+1
terms in there. The first axis of the data arrays indexes into training examples, so in your code
X.append(data[i:i+steps, :])
Y.append(data[i+1:i+1+steps, :])
the i
th example from X
is being paired with the i+1
th example from Y
.
The second i+1
seems fine, it's indexing the second axis, which is time.
Doh! Nevermind, I see how it's set up now.
So this all looks ok from a theanets usage perspective. I can't really comment on the convergence behavior of this model vs keras, though -- you might need to muck around with optimizing the hyperparameters for training the model, but that's not something I can provide much support for.