Dataset layout?
hughperkins opened this issue · 1 comments
hughperkins commented
Some questions on data:
- Is it a fair impression that each of the
reviewsx...
files, for beer reviews, is laid out as follows?
[look] [smell] [feel] [taste] [overall] [input words ...]
? (I used the values for the 'deep brown color with a thin tan head that quickly dissipated' review, to obtain this sequence, by comparison with the page at https://www.beeradvocate.com/beer/profile/144/30806/?ba=Will_Turner , and the numbers in the dataset)
- why are the datasets broken down into 'aspect1', 'aspect2', etc?
- Is it a fair impression that each of these is the results of decorrelation, section 5.1, 'Dataset', for that specific aspect?
- Can I assume that aspect1 is the first aspect, as laid out inside the files, ie
[look]
? - is this also true for 2 and 3, ie:
- aspect2 is
[smell]
?, and - aspect3 is
[feel]
?
- aspect2 is
- which wordvectors are you using? It looks like you are using something 200-dimensional? Maybe glove 200, from https://nlp.stanford.edu/projects/glove/, ie http://nlp.stanford.edu/data/glove.6B.zip ?
hughperkins commented
Edit, oh right, and, annotations.json
, is this kind of like 'ground truth' for which bits of text should ideally be used for each aspect? Dont need this for training/dev-validation? Just used for the 'precision' bit of table 2, is this a fair impression?
Edit2, ok the annotations.json
presumably corresponds to this bit? :