julianser/hed-dlg-truncated

Exception: Error, malformed dictionary!

Closed this issue · 6 comments

rzai@rzai00:/prj/hed-dlg-truncated$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python train.py --prototype prototype_ubuntu_LSTM > Model_Output.txt
Using gpu device 0: GeForce GTX 1080 (CNMeM is disabled)
2016-12-02 10:46:12,985: main: DEBUG: State:
{'add_latent_gaussian_per_utterance': False,
'bidirectional_utterance_encoder': False,
'bs': 80,
'collaps_to_standard_rnn': True,
'condition_decoder_only_on_latent_variable': False,
'condition_latent_variable_on_dcgm_encoder': False,
'condition_latent_variable_on_dialogue_encoder': False,
'cost_threshold': 1.003,
'cutoff': 1.0,
'decoder_bias_type': 'all',
'decoder_drop_previous_input_tokens': False,
'decoder_drop_previous_input_tokens_rate': 0.75,
'deep_dialogue_input': True,
'deep_direct_connection': False,
'deep_out': True,
'dialogue_encoder_gating': 'GRU',
'dialogue_rec_activation': 'lambda x: T.tanh(x)',
'dictionary': '../UbuntuData/Dataset.dict.pkl',
'direct_connection_between_encoders_and_decoder': False,
'end_sym_sentence': 'eot',
'end_sym_utterance': '',
'eod_sym': -1,
'eos_sym': 1,
'first_speaker_sym': -1,
'fix_encoder_parameters': False,
'fix_pretrained_word_embeddings': False,
'initialize_from_pretrained_word_embeddings': False,
'kl_divergence_annealing_rate': 1.6666666666666667e-05,
'latent_gaussian_linear_dynamics': False,
'latent_gaussian_per_utterance_dim': 10,
'level': 'DEBUG',
'loop_iters': 3000000,
'lr': 0.0002,
'max_grad_steps': 80,
'maxout_out': False,
'minerr': -1,
'minor_speaker_sym': -1,
'off_screen_sym': -1,
'oov': '',
'patience': 20,
'pause_sym': -1,
'prefix': 'UbuntuModel_',
'pretrained_word_embeddings_file': '',
'qdim_decoder': 2000,
'qdim_encoder': 10,
'rankdim': 300,
'reset_hidden_states_between_subsequences': False,
'reset_utterance_decoder_at_end_of_utterance': False,
'reset_utterance_encoder_at_end_of_utterance': False,
'save_dir': 'Output',
'scale_latent_variable_variances': 10,
'sdim': 10,
'second_speaker_sym': -1,
'seed': 1234,
'sent_rec_activation': 'lambda x: T.tanh(x)',
'sort_k_batches': 20,
'test_dialogues': '../UbuntuData/Test.dialogues.pkl',
'third_speaker_sym': -1,
'time_stop': 44640,
'train_dialogues': '../UbuntuData/Training.dialogues.pkl',
'train_freq': 10,
'train_latent_gaussians_with_kl_divergence_annealing': False,
'unk_sym': 0,
'updater': 'adam',
'use_nce': False,
'utterance_decoder_gating': 'LSTM',
'utterance_encoder_gating': 'GRU',
'valid_dialogues': '../UbuntuData/Validation.dialogues.pkl',
'valid_freq': 5000,
'voice_over_sym': -1}
2016-12-02 10:46:12,985: main: DEBUG: Timings:
{'train_cost': [],
'train_kl_divergence_cost': [],
'train_misclass': [],
'train_posterior_mean_variance': [],
'valid_cost': [],
'valid_emi': [],
'valid_kl_divergence_cost': [],
'valid_misclass': [],
'valid_posterior_mean_variance': []}
Traceback (most recent call last):
File "train.py", line 531, in
main(args)
File "train.py", line 157, in main
model = DialogEncoderDecoder(state)
File "/home/rzai/prj/hed-dlg-truncated/dialog_encdec.py", line 1637, in init
raise Exception("Error, malformed dictionary!")
Exception: Error, malformed dictionary!
rzai@rzai00:
/prj/hed-dlg-truncated$

Hi,
I am facing the same issue! Have you found any solution?

@iclearn @lovejasmine Can either of you double check that your dictionary actually contains the 'eot' token?

yes it does! But the error is in line 1637 dialog_encdec.py:
if self.end_sym_utterance not in self.str_to_idx: raise Exception("Error, malformed dictionary!")

and line 19 of state.py defines it as
state['end_sym_utterance'] = '</s>'
and '</s>' is not there in the dictionary. So may be that is causing the problem?

Also the line 567 of state.py defines
state['end_sym_sentence'] = '__eot__'
which is not used elsewhere in the code.

I think I had the same issue once and after renaming the 'eot' token to

'</s>'

in my dictionary, it worked. I assume somewhere in the code the end-of-turn token is defined as '</s>', regardless of your definition in state.py.

You should debug this a bit more before changing the dictionary.

What is the value of "self.end_sym_utterance" before getting the error "Error, malformed dictionary!"?

I've fixed the issue. The problem was in the model state, where ''end_sym_sentence' should have been renamed to 'end_sym_utterance'.