yuboxie/meed

how is the data organized?

QQToyota opened this issue · 1 comments

hello,I read your code and don‘t find the description of data format,Can you add it?

Hello,

I just uploaded the training data (tokenized). Please check the README file.

Briefly speaking, the format is:

  • enc_input.npy: tokenized dialog context, with shape H * N * L
  • enc_input_len.npy: utterance lengths of dialog context, with shape H * N
  • enc_input_e.npy: emotion labels of the dialog context, with shape N * H * E
  • dec_input.npy: tokenized decoder input, with shape N * (L+1)
  • dec_input_len.npy: lengths of decoder input, with shape N
  • target.npy: tokenized target utterance, with shape N * (L+1)
  • hist_len.npy: lengths of dialog context, with shape N

where

  • N is the number of data points
  • H is the maximum dialog context length
  • L is the maximum utterance length
  • E is the number of emotion categories