what is the dataset of this code？

Question

what is the dataset of this code？

Closed this issue 5 years ago · 3 comments

Hi, could you tell me where can I get the data that the code need?
Thank you for your help!

Answer 1 · 2019-09-03T18:32:16.000Z

Unfortunately, I don't have the dataset that is mentioned in this code (sst.txt) anymore. However, I used a different dataset with 2.9M sentences extracted from Yelp reviews. You can download this dataset at this link. You will likely need to modify the _load function in dataset.py to account for this new dataset, although it shouldn't be much work since I already preprocessed this dataset.

Answer 2 · 2019-12-09T03:16:26.000Z

Hi, i used the dataset that you had linked earlier but i have some issues when i modified the _load function in dataset.py by changing it from

def _load(self):
        with open('sst.txt','r') as f:
            sents = [x for x in f.read().split('\n') if \
                     len(x.split())-1<=self.seq_len-2]
            reviews = [x.split()[1:] for x in sents]
            labels = [int(x.split()[0]) for x in sents]
        return (reviews, labels)

to

def _load(self):
        with open('sst.txt','r') as f:
            sents = [x for x in f.read().split('\n') if \
                     len(x.split())-1<=self.seq_len-2]
            reviews = [x.split() for x in sents]
            labels = [i for i in range(len(sents))]
        return (reviews,labels)#(reviews, labels)

and i ended up receiving an error as shown below in the image

Do you mind helping me with this?

Answer 3 · 2019-12-09T03:18:54.000Z

I suspect the inputs need to have a long dtype. Try doing x = self.embedding(x.long()).permute(1,0,2) as it's shown in the stack trace. And, move the tensors to cuda with .cuda() if you are using a GPU.