julianser/hed-dlg-truncated

How to run this code with a new dataset?

Opened this issue · 0 comments

egrcc commented

Hi,

I want to run this code with a new dataset and I have some questions about that.

  1. According to your description in Creating Datasets, I think the format of text files (one dialogue) is that:
Good morning ! </s> Good morning ! </s> How are you ? </s> I am fine . </s>

Is this right? But in the Ubuntu dataset you provided in http://www.iulianserban.com/Files/UbuntuDialogueCorpus.zip, the format is not that. For example, in raw_test_text.txt:

anyone knows why my stock oneiric exports env var ' **unknown** I mean what is that used for ? I know of $USER but not $USERNAME . My precise install doesn't export USERNAME __eou__ __eot__ looks like it used to be exported by lightdm , but the line had the comment " // **unknown** : Is this required ?" so I guess it isn't surprising it is gone __eou__ __eot__ thanks ! How the heck did you figure that out ? __eou__ __eot__ https://bugs.launchpad.net/lightdm/+bug/864109/comments/3 __eou__ __eot__ nice thanks ! __eou__

It does not contain "</s>" and what does "__eou__" and "__eot__" means?

  1. In state.py, what the differences between state['end_sym_utterance'] and state['eos_sym']? In prototype_ubuntu_HRED, the value of state['end_sym_utterance'] is set to '__eot__'. In my dataset, I do not have the symbol '__eot__'. How can I set the value of state['end_sym_utterance'] ?

Thanks~