Generation error: list index out of range

Question

Generation error: list index out of range

Eterance opened this issue 2 years ago · 5 comments

Hi! It's me again.🤣

When I run the annotate script on wikitq dataset (Here are my cmd and args)

python -u scripts/annotate_binder_program.py --dataset wikitq
--dataset_split test
--prompt_file templates/prompts/wikitq_binder.txt
--n_parallel_prompts 1
--n_processes 2
--max_generation_tokens 512
--temperature 0.4
--sampling_n 20
-v

Error generation error: list index out of range occurred in 76 samples. I took the first 5 samples (wtqid#nu-30, 208, 263, 279, 367) for debug, and found that the input still exceeded the length limit even when n_shot was reduced to 0.

scripts/annotate_binder_program.py

The list few_shot_prompt_list is empty and this step will throw the exception.

generation/generator.py

Will the missing results of these 76 samples have any effect on the execution stage? How should I solve this problem?

Thanks!

Answer 1 · 2022-12-11T19:33:58.000Z

Ok, could you check the total tokens of prompt when the error is thrown?

Answer 2 · 2022-12-11T19:44:56.000Z

That is to check the variable prompt is over 8001(the max tokens length restriction of code-davinci-002). If it is, then check why we are throwing this error. I remember we didn't write this error to throw.

Answer 3 · 2022-12-12T06:18:24.000Z

Thanks for the reply!

I have provided some local variables value at left panel in the above screenshots, and the location of the breakpoint (yellow highlighted lines) is where the error was thrown.

As screenshot 1, in sample wtqid#nu-208, even n_shot was reduced to 0, total tokens of prompt ( len(tokenizer.tokenize(prompt)) ) is 10259, still over max_prompt_tokens = 7489 ( 8001 - 512 ).

As screenshot 2, because n_shot = 0 , in method generator.build_few_shot_prompt_from_file(), list few_shot_prompt_list have no element inside, and code few_shot_prompt_list[-1] = few_shot_prompt_list[-1].strip() will throw this exception.

The exception is caught by outer except in file scripts/annotate_binder_program.py, method worker_annotate(), as shown in the screenshot below.

Answer 4 · 2022-12-12T09:08:16.000Z

I see, so these exceptions were caught.
Then it is right I think, I remembered wikitq indeed has some extremely long examples to use OpenAI codex model to do that. It won't affect the result too much actually.

Answer 5 · 2022-12-12T10:14:54.000Z

Got it. Thank you for your reply!