meaningful result?
liuchenxjtu opened this issue · 15 comments
Hi nicolas,
first really thanks for your work. when I run your code, I cannot get meaningful results, all I got is like
NFO:lib.nn_model.train:[why ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[who ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[yeah ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[what is it ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as as i]
INFO:lib.nn_model.train:[why not ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[really ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[huh ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[yes ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[what ' s that ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as as i]
INFO:lib.nn_model.train:[what are you doing ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as as i]
INFO:lib.nn_model.train:[what are you talking about ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as as i]
INFO:lib.nn_model.train:[what happened ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as as i]
INFO:lib.nn_model.train:[hello ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[where ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[how ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[excuse me ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[who are you ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as as i]
INFO:lib.nn_model.train:[what do you want ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as as i]
INFO:lib.nn_model.train:[what ' s wrong ?] -> [i ' . . $$$ .
or
NFO:lib.nn_model.train:[what are you talking about ?] -> [i ' . . . . . . . . , , , , , ,]
INFO:lib.nn_model.train:[what happened ?] -> [i ' . . . . . . . . , , , , , ,]
INFO:lib.nn_model.train:[hello ?] -> [i ' . . . . . . . . , , , , , ,]
INFO:lib.nn_model.train:[where ?] -> [i ' . . . . . . . . , , , , , ,]
INFO:lib.nn_model.train:[how ?] -> [i ' . . . . . . . . , , , , , ,]
INFO:lib.nn_model.train:[excuse me ?] -> [i ' . . . . . . . . , , , , , ,]
INFO:lib.nn_model.train:[who are you ?] -> [i ' . . . . . . . . , , , , , ,]
could you sharing your opinion with me? really appreciate
@liuchenxjtu how many iterations have you finished when you trained the model?
Thanks for your reply. About 20. It is very slow in my machine. How many you suggest? Do u have some sample results for different iterations?
On 2016/01/20, at 21:53, nextdawn notifications@github.com wrote:
@liuchenxjtu how many iterations have you finished when you trained the model?
―
Reply to this email directly or view it on GitHub.
Guys, I got the similar lame results yesterday...
My guess is that there are some foundational problems in this approach:
- Since word2vec vectors are used for words representations and the model returns an approximate vector for every next word, this error is accumulated from one word to another and thus starting from the third word the model fails to predict anything meaningful...
This problem might be overcome if we replace our approximate word2vec vector every thimestamp with a "correct" vector, i.e. the one that corresponds to an actual word from the dictionary. Does it make sence?
However you need to dig into seq2seq code to do that. @farizrahman4u could be quite helpful here. - The second problem relates to word sampling: even if you manage to solve the aforementioned issue, in case you stick to using argmax() for picking the most probable word every time stamps, the answers gonna be too simple and not interesting, like:
are you a human? -- no .
are you a robot or human? -- no .
are you a robot? -- no .
are you better than siri? -- yes .
are you here ? -- yes .
are you human? -- no .
are you really better than siri? -- yes .
are you there -- you ' re not going to be
are you there?!?! -- yes .
Not to mislead you: these results were achieved on a different seq2seq architecture, based on tensorflow.
Sampling with temperature could be used in order to diversify the output results, however that's again should be done inside seq2seq library.
@nicolas-ivanov Did you try the other models? Seq2seq, Seq2seq with peek, Attention Seq2seq etc?
I recently tested attention seq2seq on the babi dataset and it worked (100% val acc).
@farizrahman4u not yet, I'll set the experiment with Attention Seq2seq now.
Meanwhile could you please post the link here to your dataset? And some results example.
The standard babi dataset from facebook (used by keras in examples). I did it using a slightly different layer but the idea is almost as same as attention seq2seq. I will be posting the code in a few days as I have not tested on all the babi tasks yet.
Hello @farizrahman4u , I tried using attention seq2seq model, but got ShapeMismatch error.
This error doesn't occur while using SimpleSeq2Seq model. Is there anything that I missing?
Please post your code.
@farizrahman4u : Following code is from model.py file, i haven't changed much apart from the model name :
import os.path
from keras.models import Sequential
from seq2seq.models import AttentionSeq2seq
from seq2seq.models import SimpleSeq2seq
from seq2seq.models import Seq2seq
from configs.config import TOKEN_REPRESENTATION_SIZE, HIDDEN_LAYER_DIMENSION, SAMPLES_BATCH_SIZE, \
INPUT_SEQUENCE_LENGTH, ANSWER_MAX_TOKEN_LENGTH, NN_MODEL_PATH
from utils.utils import get_logger
_logger = get_logger(__name__)
def get_nn_model(token_dict_size):
_logger.info('Initializing NN model with the following params:')
_logger.info('Input dimension: %s (token vector size)' % TOKEN_REPRESENTATION_SIZE)
_logger.info('Hidden dimension: %s' % HIDDEN_LAYER_DIMENSION)
_logger.info('Output dimension: %s (token dict size)' % token_dict_size)
_logger.info('Input seq length: %s ' % INPUT_SEQUENCE_LENGTH)
_logger.info('Output seq length: %s ' % ANSWER_MAX_TOKEN_LENGTH)
_logger.info('Batch size: %s' % SAMPLES_BATCH_SIZE)
model = Sequential()
seq2seq = SimpleSeq2seq(
input_dim=TOKEN_REPRESENTATION_SIZE,
input_length=INPUT_SEQUENCE_LENGTH,
hidden_dim=HIDDEN_LAYER_DIMENSION,
output_dim=token_dict_size,
output_length=ANSWER_MAX_TOKEN_LENGTH,
depth=3
)
model.add(seq2seq)
model.compile(loss='mse', optimizer='rmsprop')
model.save_weights(NN_MODEL_PATH)
# use previously saved model if it exists
_logger.info('Looking for a model %s' % NN_MODEL_PATH)
if os.path.isfile(NN_MODEL_PATH):
_logger.info('Loading previously calculated weights...')
model.load_weights(NN_MODEL_PATH)
_logger.info('Model is built')
return model
Hi @nicolas-ivanov, you mentioned that 'the bad results were based on tensorflow', what are the datasets and other settings? what is the inital perplexity and the converge perplexity on both training set and validation set? I am trying to adapt the translation model example from tensorflow to train a chatbot, is it possible for you to give some details on these? Thanks.
It was maybe due to the lack of learning iteration or lack of data size.
Hi,
Is there anyone who could successfully run this project?
Firstly, when I run this project, I met the log below:
Epoch 1/1
32/32 [==============================] - 0s - loss: nan
Epoch 1/1
32/32 [==============================] - 0s - loss: nan
Epoch 1/1
32/32 [==============================] - 0s - loss: nan
Epoch 1/1
32/32 [==============================] - 0s - loss: nan
Epoch 1/1
32/32 [==============================] - 0s - loss: nanINFO:lib.nn_model.train:[Hi!] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[Hi] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[what ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[why ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[who ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[yeah ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[what is it ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[why not ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[really ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[huh ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[yes ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[what ' s that ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[what are you doing ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[what are you talking about ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[what happened ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[hello ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[where ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[how ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[excuse me ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[who are you ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[what do you want ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[what ' s wrong ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[so ?] -> [raining raining raining raining raining raining]
Secondly, I change the model code from SimpleSeq2seq to AttentionSeq2seq.
Found a little difference, it print time now. But still wrong.
Epoch 1/1
32/32 [==============================] - 3s - loss: nan
Epoch 1/1
32/32 [==============================] - 3s - loss: nan
Epoch 1/1
32/32 [==============================] - 3s - loss: nan
Epoch 1/1
32/32 [==============================] - 3s - loss: nan
Epoch 1/1
32/32 [==============================] - 3s - loss: nanINFO:lib.nn_model.train:[Hi!] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[Hi] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[what ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[why ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[who ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[yeah ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[what is it ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[why not ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[really ?] -> [raining raining raining raining raining raining]
INFO:lib.nn_model.train:[huh ?] -> [raining raining raining raining raining raining]
Thanks a lot.
@KevinYuk I got the same "raining" result! Do you have any insight?
Just in my opinion, repeating same words means 'not yet fitted'.
And it said 'loss: nan'. It means something is not good.(very high loss or ...)
please re-consider to set your hyperparameters(could you give your hyperparameters?)