ShiningLab/POS-Tagger-for-Punctuation-Restoration

ValueError: not enough values to unpack (expected 4, got 3)

Kailash-Natarajan opened this issue · 3 comments

I get this error while training on google colab. Started happening recently. Any suggestions/solutions for rectifying this?
Works fine with POS tagging disabled because it skips the problematic code segment.

If I noted correctly, it happens only for trainset_generator, doesn't happen for validset_generator, although I do not know this for sure.

Training...
  0%|          | 0/2237 [00:00<?, ?it/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-6a92f2de38ba> in <module>()
      7 re=Restorer()
----> 8 re.train()

7 frames
/content/drive/MyDrive/ColabNotebooks/funnel/main/train.py in train(self)
    230             # training set data loader
    231             trainset_generator = tqdm(self.trainset_generator)
--> 232             for data in trainset_generator:
    233                 raw_data, train_data = data
    234                 train_data = (torch.LongTensor(i).to(self.config.device) for i in train_data)

/usr/local/lib/python3.7/dist-packages/tqdm/std.py in __iter__(self)
   1183 
   1184         try:
-> 1185             for obj in iterable:
   1186                 yield obj
   1187                 # Update and possibly print the progressbar.

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in __next__(self)
    519             if self._sampler_iter is None:
    520                 self._reset()
--> 521             data = self._next_data()
    522             self._num_yielded += 1
    523             if self._dataset_kind == _DatasetKind.Iterable and \

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
    559     def _next_data(self):
    560         index = self._next_index()  # may raise StopIteration
--> 561         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    562         if self._pin_memory:
    563             data = _utils.pin_memory.pin_memory(data)

/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     42     def fetch(self, possibly_batched_index):
     43         if self.auto_collation:
---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
     45         else:
     46             data = self.dataset[possibly_batched_index]

/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py in <listcomp>(.0)
     42     def fetch(self, possibly_batched_index):
     43         if self.auto_collation:
---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
     45         else:
     46             data = self.dataset[possibly_batched_index]

/content/drive/MyDrive/ColabNotebooks/funnel/main/src/utils/pipeline.py in __getitem__(self, idx)
    188             idx = random.choice(list(self.sample_pool))
    189             self.sample_pool.remove(idx)
--> 190             return generate_seq(self.raw_data[idx:idx+self.config.max_seq_len-2], self.config)
    191         else:
    192             if self.config.use_pos:

/content/drive/MyDrive/ColabNotebooks/funnel/main/src/utils/pipeline.py in generate_seq(lines, config)
     90     for l in lines:
     91         if config.use_pos:
---> 92             token, pun, mask, tag = l
     93         else:
     94             # token, pun, mask, tag

ValueError: not enough values to unpack (expected 4, got 3)

A possible reason for this is the missing of the tag item in one line of the data. To ensure the success of generating the POS dataset in the first run, I would suggest print l here to see what exactly is here and take a look at the dataset for the data structure and content of one line in the data.

Issue closed due to no further feedback in one week.

Sorry was caught up. Anyways, I just looked through. Had to regenerate .json files again with POS tagger enabled. Works fine now. Thanks!