how is the data organized?
QQToyota opened this issue · 1 comments
QQToyota commented
hello,I read your code and don‘t find the description of data format,Can you add it?
yuboxie commented
Hello,
I just uploaded the training data (tokenized). Please check the README file.
Briefly speaking, the format is:
enc_input.npy
: tokenized dialog context, with shape H * N * Lenc_input_len.npy
: utterance lengths of dialog context, with shape H * Nenc_input_e.npy
: emotion labels of the dialog context, with shape N * H * Edec_input.npy
: tokenized decoder input, with shape N * (L+1)dec_input_len.npy
: lengths of decoder input, with shape Ntarget.npy
: tokenized target utterance, with shape N * (L+1)hist_len.npy
: lengths of dialog context, with shape N
where
- N is the number of data points
- H is the maximum dialog context length
- L is the maximum utterance length
- E is the number of emotion categories