Shawn1993/cnn-text-classification-pytorch

why * 1000?

kayzhou opened this issue · 1 comments

In data.py

    def __next__(self):
        self._fill_buffer(self._batch_size * 1000)

Here, why the _bath_size is multiplied by 1000?

    def _fill_buffer(self, size):
        if not self._buffer:
            for line in self._file:
                label, sentence = line.split("\t")
                label = int(label.strip())
                sequence = [self._vocab.token_to_id(t) for t in sentence.strip().split()]
                self._buffer.append((label, sequence))
                if len(self._buffer) >= size:
                    break

            self._buffer.sort(key=lambda x: len(x[1]))
            self._buffer_iter = iter(self._buffer)

Using self._buffer to stop filling, I think, is wrong. self._buffer never change in the next().