how is the data organized？

QQToyota opened this issue 3 years ago · 1 comments

QQToyota commented 3 years ago

hello，I read your code and don‘t find the description of data format，Can you add it？

yuboxie commented 3 years ago

Hello,

I just uploaded the training data (tokenized). Please check the README file.

Briefly speaking, the format is:

enc_input.npy: tokenized dialog context, with shape H * N * L
enc_input_len.npy: utterance lengths of dialog context, with shape H * N
enc_input_e.npy: emotion labels of the dialog context, with shape N * H * E
dec_input.npy: tokenized decoder input, with shape N * (L+1)
dec_input_len.npy: lengths of decoder input, with shape N
target.npy: tokenized target utterance, with shape N * (L+1)
hist_len.npy: lengths of dialog context, with shape N

where

N is the number of data points
H is the maximum dialog context length
L is the maximum utterance length
E is the number of emotion categories