codekansas/keras-language-modeling

Lambda soes not support making

mathrb opened this issue · 32 comments

Hello,

Thanks for sharing you experience on this subject.

I've got an issue running insurance_qa_eval.py. Here's the full stacktrace :
Traceback (most recent call last): File "insurance_qa_eval.py", line 274, in <module> model.compile(optimizer=optimizer) File "/home/myuser/tests/insurance_qna/keras-language-modeling/keras_models.py", line 114, in compile qa_model = self.get_qa_model() File "/home/myuser/tests/insurance_qna/keras-language-modeling/keras_models.py", line 101, in get_qa_model self._models = self.build() File "/home/myuser/tests/insurance_qna/keras-language-modeling/keras_models.py", line 279, in build question_pool = merge([maxpool(question_f_dropout), maxpool(question_b_dropout)], mode='concat', concat_axis=-1) File "/usr/local/lib/python3.4/dist-packages/keras/engine/topology.py", line 485, in __call__ self.add_inbound_node(inbound_layers, node_indices, tensor_indices) File "/usr/local/lib/python3.4/dist-packages/keras/engine/topology.py", line 543, in add_inbound_node Node.create_node(self, inbound_layers, node_indices, tensor_indices) File "/usr/local/lib/python3.4/dist-packages/keras/engine/topology.py", line 149, in create_node output_masks = to_list(outbound_layer.compute_mask(input_tensors[0], input_masks[0])) File "/usr/local/lib/python3.4/dist-packages/keras/engine/topology.py", line 578, in compute_mask 'but was passed an input_mask: ' + str(input_mask)) Exception: Layer lambda_1 does not support masking, but was passed an input_mask: Elemwise{neq,no_inplace}.0

I'm using

  • Keras 1.0.2
  • Thenao 0.8.2

Thanks

It is a small change in the Keras source code (set the supports_masking class variable in the Lambda layer to True instead of False). Otherwise there isn't a way to do this. Masking isn't really necessary though.

You could also adapt the attribute of the Lambda Layer, by adding the line

maxpool.__setattr__('supports_masking',True)

But both changes raise a new error

Exception: Merge does not support masking, but was passed an input mask

caused by the line

question_pool = merge([maxpool(question_f_dropout), maxpool(question_b_dropout)], mode='concat', concat_axis=-1)

ah yeah merge doesn't support masking yet, you can use this pull request -> keras-team/keras#2413

Ok thanks :)

Hi @codekansas,
Thanks for this promising package. I look forward to testing it.
Even once I use the pull request you mention above, i get an error. Any ideas?
I typed the following command:
python insurance_qa_eval.py
I get this output:
Epoch 1 :: 2016-05-13 04:23:07 :: Train on 18540 samples, validate on 3708 samples
Epoch 1/1
Traceback (most recent call last):
File "insurance_qa_eval.py", line 289, in
evaluator.train(model)
File "insurance_qa_eval.py", line 127, in train
hist = model.fit([questions, good_answers, bad_answers], nb_epoch=1, batch_size=batch_size, validation_split=split)
File "/Users/username/Documents/keras-language-modeling/keras_models.py", line 132, in fit
return self.training_model.fit(x, y, *_kwargs)
File "/Users/username/anaconda/lib/python2.7/site-packages/keras/engine/training.py", line 1011, in fit
callback_metrics=callback_metrics)
File "/Users/username/anaconda/lib/python2.7/site-packages/keras/engine/training.py", line 749, in _fit_loop
outs = f(ins_batch)
File "/Users/username/anaconda/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 513, in call
return self.function(_inputs)
File "/Users/username/anaconda/lib/python2.7/site-packages/theano/compile/function_module.py", line 786, in call
allow_downcast=s.allow_downcast)
File "/Users/username/anaconda/lib/python2.7/site-packages/theano/tensor/type.py", line 177, in filter
data.shape))
TypeError: ('Bad input argument to theano function with name "/Users/username/anaconda/lib/python2.7/site-packages/keras/backend/theano_backend.py:509" at index 3(0-based)', 'Wrong number of dimensions: expected 1, got 2 with shape (128, 1).')

Thanks for your help!

Oh I'm not sure. To be honest, I've been messing around with stuff a lot and it's far from cohesive. I can't tell much from the error report... You might try cloning again. I'm not going to be able to work on this much until June, unfortunately.

Got it. Thanks for your quick response! I'll read the pull request comments a bit more carefully to see if i can make it work. Good luck with your projects!

@migueljette Any update? I have met the same error with the pull request. I am now trying to make it work as well.

@eshijia Unfortunately, no. I had to put this aside for a few weeks as well. I was hoping to test it out quickly, and I didn't have any time to debug it at all. Do you have ideas for debugging?

@migueljette Yes! After git reset with a few specific commits, I found some problems. In the LanguageModel class (keras_model.py), @codekansas had changed the line 107 from
qa_model = merge([question_output, answer_output], mode=similarity, output_shape=lambda x: x[:-1])
to
qa_model = merge([question_output, answer_output], mode=similarity, output_shape=lambda x: x[0][:-1]).
If you don't change that, you will make the insurance_qa_eval.py work.
I am still reading and running the code to verify some results. Maybe @codekansas can give more explanations when he works on this again.

Hi @eshijia ! Thanks for the reply! That totally works!! Now I have to test the model and framework with my own data. Ha ha! Thanks again. I hope i have a bit more time soon to play with this.

@migueljette Cool! In my experiments with my own data, the basic 'cosine' similarity metric bring the best result. In addition, I haven't reproduced the results with Insurance_QA as the blog mentioned. Hope @codekansas can share the model parameter settings of his results.

I didn't reproduce the results from the paper, I'm not sure how they got the results they did. I wondered if they just let it train for a really long time.

I got around 55% accuracy for the ConvNet and didn't focus much on the RNN. If someone can get better results I would be very interested.

Thanks for the reply! I mean to reproduce the results of your own experiments which mentioned in the blog, not the results from the paper.

Oh sure! Hmm... The major thing was using a bunch of dimensions, I found 1000 embedding dimensions worked well (no word2vec pretraining). I think this commit has the right parameters (cosine similarity, margin of 0.009 maybe?) I found that varying the margin from 0.009 - 0.2 was a good range, changing similarity might help... It is pretty fast to train on my GPU, so you can try different sets of parameters, but I think those parameters will give you similar results to what I got after ~100 epochs maybe (unfortunately I don't have my desktop with me right now so I can't do it myself)

OK, Got it! I am training the model on my GPU. I will share my results in the future! Thanks again!

Hi, I have changed my keras installation as stated in: keras-team/keras#2413
after installing keras 1.03.
And then I change the following line in keras_model.py:
qa_model = merge([question_output, answer_output], mode=similarity, output_shape=lambda x: x[0][:-1])
back to:
qa_model = merge([question_output, answer_output], mode=similarity, output_shape=lambda x: x[:-1])

Now I am getting this error:

Epoch 1 :: 2016-06-06 08:27:43 :: Traceback (most recent call last):
File "insurance_qa_eval.py", line 291, in
evaluator.train(model)
File "insurance_qa_eval.py", line 130, in train
hist = model.fit([questions, good_answers, bad_answers], nb_epoch=1, batch_size=batch_size, validation_split=split)
File "/home/ubuntu/keras-language-modeling/keras_models.py", line 132, in fit
return self.training_model.fit(x, y, **kwargs)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/Keras-1.0.3-py2.7.egg/keras/engine/training.py", line 994, in fit
batch_size=batch_size)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/Keras-1.0.3-py2.7.egg/keras/engine/training.py", line 925, in _standardize_user_data
exception_prefix='model target')
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/Keras-1.0.3-py2.7.egg/keras/engine/training.py", line 104, in standardize_input_data
str(array.shape))
Exception: Error when checking model target: expected merge_4 to have shape (None, 282) but got array with shape (18540, 1)

What have I done wrong? I notice the change in Keras source involves changes in the test folder. Do I need to replace the merge by the merge in the test file?

Thanks.

@wailoktam I think you should uninstall your keras installation, and re-install it by cloning the Keras repo of codekansas and running python setup.py install. You can create a virtual environment to do that.

Hi, I replace my Keras installation with what is found here:

https://github.com/codekansas/keras

Now it is yielding this warning:

/home/ubuntu/anaconda2/lib/python2.7/site-packages/Keras-1.0.1-py2.7.egg/keras/backend/theano_backend.py:509: UserWarning: theano.function was asked to create a function computing outputs given certain inputs, but the provided input variable at index 3 is not part of the computational graph needed to compute the outputs: merge_4_target.
To make this warning into an error, you can pass the parameter on_unused_input='raise' to theano.function. To disable it completely, use on_unused_input='ignore'.
**kwargs)

I have change the following line in keras_model.py:
qa_model = merge([question_output, answer_output], mode=similarity, output_shape=lambda x: x[0][:-1])
back to:
qa_model = merge([question_output, answer_output], mode=similarity, output_shape=lambda x: x[:-1])

following eshijia.

Can any body help?

Yes, I also meet this warning. But it has no effect. The code still works. I think the warning is due to the change of the pull request.

I get the error too, I'm not sure why it happens (has happened in other Keras projects). I will change the master branch back (changed it originally without testing it)

Hi, thanks for you guy's prompt response. I have not yet start checking the cause of the error. I once get such error for my own code because I define a custom objective function not in the way theano likes it. The error is always pointing to the last layer, no matter what it is. Once I change it to a default objective function, the warning disappears. It may be irrelevant this time though.

Hi, I am trying to reduce the size of the training set and reducing the training epochs such that I can speed up debugging. (I am looking for the cause of the warning) However, I find some puzzling codes.

In line 117, I find the following line, which I believe is for creating negative training samples:

bad_answers = self.pada(random.sample(self.answers.values(), len(good_answers)))

I think there are some chance that the random sampling would actually get the correct answer.

In line 126, I find the parameters good_answers and bad_answers puzzling:

        ```hist = model.fit([questions, good_answers, bad_answers], nb_epoch=1, batch_size=batch_size, validation_split=split)```

It does not quite match the usage described in the Keras documentation given below:

fit(self, x, y, batch_size=32, nb_epoch=10, verbose=1, callbacks=[], validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None)

x: Numpy array of training data, or list of Numpy arrays if the model has multiple inputs. If all inputs in the model are named, you can also pass a dictionary mapping input names to Numpy arrays.
y: Numpy array of target data, or list of Numpy arrays if the model has multiple outputs. If all outputs in the model are named, you can also pass a dictionary mapping output names to Numpy arrays

I am not sue what bad_answers correspond to in this line.

Can anyone explain what is going on here? Many thanks.

Hmm... Line 117 is actually for creating negative training samples, and may get the correct answer, but it will not result in a bad training result after some epochs. (Of course, maybe we can find a better way to create negative samples)

As for line 126, I think you should understand more about the functional API of keras, not just the sequential API.

Thanks for your prompt response. I think the x in fit is the training sample whereas the y in fit is the label (1 for correct label and 0 for incorrect label) and x and y must have the same number of elements? The document does say either x or y can be a list. But now we get one list. I can see that the list correspond to x. Where is the array of labels for training?

I figure out how to do away with the warning about unused input:

Changing the following line:

self.training_model.compile(loss=lambda y_true, y_pred: y_pred, optimizer=optimizer, **kwargs)

to:

self.training_model.compile(loss=lambda y_true, y_pred: y_pred + y_true - y_true, optimizer=optimizer, **kwargs)

However, I don't quite get what exactly the first line is trying to do. I can see that the author is moving the calculation of the loss to a previous layer and this line is supposed to just pass the output of that layer.

If the author can explain the details, I would be extremely grateful.

The loss function is a hinge loss, i.e. loss = margin - cos(question, good_answer) + cos(question, bad_answer) where cos is the cosine similarity between the two things. We're trying to minimize this value. Cosine similarity between two layers can be done with a merge operation. To fit this idea within the schema of Keras, we can do another merge to get the desired loss and call that value y_pred, which becomes the value we want to minimize.

We basically have two models, one for the question and one for the answer, and we're learning them at the same time to be close to each other rather than close to some predicted value. This isn't really what Keras was set up to do, since the usual way you characterize this is with a single model which learns to be close to a specified output.

Thanks for pointing out the bit about the unused inputs part, it makes sense now. The way I set it up has y_true always be zero (I think, I haven't looked at it in a while), so you could change it to just y_pred + y_true.

@eshijia Can you share the results you promised? :-)

I've trained the embedding model out of the box. I've got the following numbers, which are way below what is indicated in these notes:

Top-1 Precision: [0.18611111111111112, 0.18722222222222223, 0.184]
MRR: [0.2722904433042418, 0.26672897883551283, 0.26999670775205087]

@mossaab0 Each model is highly dependent on the parameters you pass it and how long you train it for. I've messed around with them a lot and the parameters in the project aren't the ones I got the best performance for. For the Embedding model, I got the best performance by using around 1000 embedding dimensions and not pretraining the weights with Word2Vec. The model should be trained to minimized error on a validation set. The results you pasted are about what got after using a 100-dimension Embedding model I think (it's been a while since I looked at it).

I've retrained the embedding model with 1000 dimensions without loading the pertained w2v model. The numbers are only slightly better than what I had mentioned with the default setup:

Top-1 Precision: [0.18722222222222223, 0.1922222222222222, 0.206]
MRR: [0.28435607158749127, 0.2773795710911621, 0.2948951984190565]

Are the models sensitive to parameters tuning (e.g., dim = 100 vs 1000) more than to the architecture (e.g., Embedding vs Embedding + CNN)?

If someone can share their setup on how to get some reasonable top-1 accuracy (e.g., > 0.40), that would be really appreciated.

You can try this:

conf = {
    'question_len': 40,
    'answer_len': 40,
    'n_words': 22353, # len(vocabulary) + 1
    'margin': 0.05,

    'training_params': {
        'save_every': 1,
        'batch_size': 128,
        'nb_epoch': 1000,
        'validation_split': 0.2,
        'optimizer': 'adam',

        'evaluate_all_threshold': {
            'mode': 'all',
            'top1': 0.4,
        },
    },

    'model_params': {
        'n_embed_dims': 1000,
    }

    'similarity_params': {
        'mode': 'cosine',
    }
}

evaluator = Evaluator(conf)

model = EmbeddingModel(conf)
optimizer = conf.get('training_params', dict()).get('optimizer', 'adam')
model.compile(optimizer=optimizer)

evaluator.train(model)

Try tweaking the margin as well (usually 0.009 - 0.2). You can play around with it to try to get better results (e.g. the similarity type). evaluate_all_threshold triggers when to evaluate on the test set; in this case, if the validation accuracy for all the validation sets is above 0.4, then it will evaluate on the entire testing set.

Man, that helped. Thanks a lot!

----- test1 -----
Top-1 Precision: 0.518889
MRR: 0.632561
----- test2 -----
Top-1 Precision: 0.493889
MRR: 0.607853
----- dev -----
Top-1 Precision: 0.513000
MRR: 0.627165