fajri91/RSTExtractor

IndexError for .merge files containing more than 512 EDUs

Closed this issue · 2 comments

When running extract_tree.py to extract the RST trees, the following IndexError is obtained for input .merge files with more than 512 EDUs.

Traceback (most recent call last):
  File "/usr/local/python/2.7.12-gcc5/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/python/2.7.12-gcc5/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "extract_tree.py", line 115, in run_thread
    tree = rst.get_subtree(rst_data)[0]
  File "RSTExtractor/rst_model.py", line 86, in get_subtree
    results = self.network.decode(encoder_output, [], [], len_edus)
  File "RSTExtractor/NeuralRST/models/architecture.py", line 312, in decode
    self.move(predicted_actions)
  File "RSTExtractor/NeuralRST/models/architecture.py", line 262, in move
    next_state = self.batch_states[idx][step + 1]
IndexError: list index out of range

The reason for this is that the size of a batch_state is initialised to 1024 in NeuralRST/models/architecture.py. May I know if there was a reasoning for this choice? I was able to fix the issue by increasing the number of states per batch depending on len_edus in MainArchitecture.decode, but it would be good to confirm. Thanks!

Hi kay-wong,

Ah yes that should fix the issue. There is no particular reason to set it to 1024. I follow the original implementation https://github.com/yunan4nlp/NNDisParser where they use 1024 as well.

When I train and use the RST Parser, 1024 size is more than enough.

Regards, Fajri

Hi Fajri,

Great, thanks for confirming!