Dataset layout?

Question

Dataset layout?

hughperkins opened this issue 7 years ago · 1 comments

Some questions on data:

Is it a fair impression that each of the reviewsx... files, for beer reviews, is laid out as follows?

[look] [smell] [feel] [taste] [overall]        [input words ...]

? (I used the values for the 'deep brown color with a thin tan head that quickly dissipated' review, to obtain this sequence, by comparison with the page at https://www.beeradvocate.com/beer/profile/144/30806/?ba=Will_Turner , and the numbers in the dataset)

why are the datasets broken down into 'aspect1', 'aspect2', etc?
- Is it a fair impression that each of these is the results of decorrelation, section 5.1, 'Dataset', for that specific aspect?
- Can I assume that aspect1 is the first aspect, as laid out inside the files, ie [look]?
- is this also true for 2 and 3, ie:
  - aspect2 is [smell]?, and
  - aspect3 is [feel]?
which wordvectors are you using? It looks like you are using something 200-dimensional? Maybe glove 200, from https://nlp.stanford.edu/projects/glove/, ie http://nlp.stanford.edu/data/glove.6B.zip ?

Answer 1 · 2017-08-13T09:15:16.000Z

Edit, oh right, and, annotations.json, is this kind of like 'ground truth' for which bits of text should ideally be used for each aspect? Dont need this for training/dev-validation? Just used for the 'precision' bit of table 2, is this a fair impression?

Edit2, ok the annotations.json presumably corresponds to this bit? :