mckinziebrandon/DeepChatModels

Training problem

b789 opened this issue ยท 7 comments

b789 commented

Hi,
i've tried to train an example model using the example_cornell.yml configuration but after 6800 steps the model is responding to all the request with the same 'sentence' : ..............
it seem the the training isn't doing much since at step 200 it prints:
training loss = 5.797; training perplexity = 329.45
Validation loss = 5.764; val perplexity = 318.69
and at step 6800 the values are almost the same:
training loss = 5.597; training perplexity = 269.70
Validation loss = 5.692; val perplexity = 296.40

What can be the cause of this ? How can a reasonable output been obtained ?

That's definitely not the expected behavior. It's been some time since I last ran the project, so let me try and reproduce your results over the weekend. I'll let you know if I see the same issue and how you may be able to resolve it.

In the meantime, could you try deleting all the extra files made by the model and running it again? By extra files, I mean the vocab, tfrecords, and *.ids files. I remember there being some subtle issues with them being reloaded that didn't occur when they were generated on a first run. I thought I had fixed those issues but this sounds related. I'll be able to provide more info after I try to reproduce the issue myself.

Hi @b789 , I was (unfortunately) able to reproduce your results! This is pretty surprising/annoying, but thanks for bringing my attention to it. My main GPU has been tied up with other work this weekend, so I wasn't able to explore much, but my best guess is that the AttentionDecoder is to blame. It uses a custom attention implementation I wrote back in the days of tf version ~1.1 when the API for doing that was very fragile and actually broke between minor releases.

With that said, and after skimming my loss plots in the wiki, you can get reasonable outputs if you just change AttentionDecoder to BasicDecoder in example_cornell.yml. I have confirmed on a smaller machine of mine that losses decrease as expected in that case.

I'm definitely not satisfied with this...hopefully I have some time soon to see what is going on with AttentionDecoder. It used to work! And of course, if you happen to find a fix for it, contributions are more than welcome! Let me know if I can help with anything else, and if I do find out what's going on with AttentionDecoder soon, I will post updates here (and push the corrections to master).

It's okay @Shaptic everything will be ok.

b789 commented

Thank @mckinziebrandon for the advice to use the BasicDecoder, in this way is actually working.
I'll try to look what's the problem with the AttentionDecoder if i can.

@b789 or anyone interested: BasicEncoder does not return it's full set of states over time (returns _, final_output only I believe), while BidirectionalEncoder returns both the set of states and the final output. When designed, this was a feature, but it is now a bug! Happy to accept a PR that makes the returned tuple of BasicEncoder the same form as returned by BidirectionalEncoder.

@mckinziebrandon thank you for this wonderful library!
Can you elaborate on training proccess a little more? I mean, config file requires 'train_from.txt', 'train_to.txt', 'valid_from.txt', 'valid_to.txt' files, but I don't really understand what they should contain. I have one .txt file with Cornell Movies Corpus, it contains rows that look like "question, answer". How should I split it?

@mckinziebrandon sorry, I've missed link to your Mega drive account, now I can see all your files