"Length of values does not match length of index" for beam_size >1
Closed this issue · 4 comments
Hi,
I am running score_predictions.py however the operation exits due to the below error.
mol_transformer/bin/python score_predictions.py -targets data/raw/tgt-test.txt -predictions experiments/results/raw_results/predictions_raw_model_step_129000_on_raw_test.txt
Traceback (most recent call last):
File "score_predictions.py", line 73, in <module>
main(opt)
File "score_predictions.py", line 38, in main
test_df['prediction_{}'.format(i + 1)] = preds
File "/home/user/miniconda3/envs/mol_transformer/lib/python3.6/site-packages/pandas/core/frame.py", line 2938, in __setitem__
self._set_item(key, value)
File "/home/user/miniconda3/envs/mol_transformer/lib/python3.6/site-packages/pandas/core/frame.py", line 3000, in _set_item
value = self._sanitize_column(key, value)
File "/home/user/miniconda3/envs/mol_transformer/lib/python3.6/site-packages/pandas/core/frame.py", line 3636, in _sanitize_column
value = sanitize_index(value, self.index, copy=False)
File "/home/user/miniconda3/envs/mol_transformer/lib/python3.6/site-packages/pandas/core/internals/construction.py", line 611, in sanitize_index
raise ValueError("Length of values does not match length of index")
I have checked the src-test.txt, tgt-test.txt, and predictions.txt files and they all contains the same number of observations. The scripts runs fine if I pass -beam_size 1 but fails when I use any other integer.
I think I got it! :)
Reopened. I put -beam_size 10 in both translate.py call and score_predictions.py call, thinking this would work however I still get the issue.
You'll find all the options in:
https://github.com/pschwllr/MolecularTransformer/blob/master/onmt/opts.py
Or when you do:
python translate.py --help
If you put -beam_size 10
, you still have to change -n_best
to how many outputs you want per prediction. For example, -n_best 3
would lead to an output like:
reaction1 top1
reaction1 top2
reaction1 top3
reaction2 top1
reaction2 top2
reaction2 top3
In score_predictions.py
you then have to change to -beam_size 3
, so that it considers always 3 prediction lines per ground-truth reaction. I should probably have called it -n_best
in the score_predictions.py
.
It might be a bit confusing at first, but the -beam_size
parameter in the scoring script should match the number of outputs per ground-truth reaction.
Hi Philippe,
Thank you for the clear and concise explanation, the script works fine now.
P.S I also removed '-fast' from the translate args since it seemed to conflict with n_best >1.
Best,
Dean