IndexError for .merge files containing more than 512 EDUs
Closed this issue · 2 comments
When running extract_tree.py to extract the RST trees, the following IndexError is obtained for input .merge files with more than 512 EDUs.
Traceback (most recent call last):
File "/usr/local/python/2.7.12-gcc5/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/local/python/2.7.12-gcc5/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "extract_tree.py", line 115, in run_thread
tree = rst.get_subtree(rst_data)[0]
File "RSTExtractor/rst_model.py", line 86, in get_subtree
results = self.network.decode(encoder_output, [], [], len_edus)
File "RSTExtractor/NeuralRST/models/architecture.py", line 312, in decode
self.move(predicted_actions)
File "RSTExtractor/NeuralRST/models/architecture.py", line 262, in move
next_state = self.batch_states[idx][step + 1]
IndexError: list index out of range
The reason for this is that the size of a batch_state is initialised to 1024 in NeuralRST/models/architecture.py. May I know if there was a reasoning for this choice? I was able to fix the issue by increasing the number of states per batch depending on len_edus
in MainArchitecture.decode
, but it would be good to confirm. Thanks!
Hi kay-wong,
Ah yes that should fix the issue. There is no particular reason to set it to 1024. I follow the original implementation https://github.com/yunan4nlp/NNDisParser where they use 1024 as well.
When I train and use the RST Parser, 1024 size is more than enough.
Regards, Fajri
Hi Fajri,
Great, thanks for confirming!