why * 1000?
kayzhou opened this issue · 1 comments
kayzhou commented
In data.py
def __next__(self):
self._fill_buffer(self._batch_size * 1000)
Here, why the _bath_size is multiplied by 1000?
kayzhou commented
def _fill_buffer(self, size):
if not self._buffer:
for line in self._file:
label, sentence = line.split("\t")
label = int(label.strip())
sequence = [self._vocab.token_to_id(t) for t in sentence.strip().split()]
self._buffer.append((label, sequence))
if len(self._buffer) >= size:
break
self._buffer.sort(key=lambda x: len(x[1]))
self._buffer_iter = iter(self._buffer)
Using self._buffer to stop filling, I think, is wrong. self._buffer never change in the next().