Reproduce Table 1 by this repo?

Question

Reproduce Table 1 by this repo?

dreaming-panda opened this issue 7 months ago · 6 comments

dreaming-panda commented 7 months ago

Hello, thanks for your nice work.

I want to reproduce table1 (i.e. the accuracy of GSM8k and ShareGPT) but I cannot find scripts to do this.

Could you point me a way?

Thank you!

Answer 1 · 2024-05-11T02:15:15.000Z

Thanks for your interests in our work! For table 1, We follow the same settings in human-eval, Spider, MT-bench and GSM8K to evaluate CLLMs' generation quality, but with Jacobi decoding instead of conventional AR decoding.

Output generation code for gsm8k and ShareGPT (using Jacobi decoding) have just been upload under the eval/gsm8k and eval/mt-bench directory. You can use the script to generate outputs and follow MT-bench's instruction to complete the evaluation.

Answer 2 · 2024-05-11T06:21:39.000Z

Thank you for your patient response!

Answer 3 · 2024-05-17T01:21:07.000Z

Thanks for your good work.

I tried to run the gsm8k scripts to reproduce Table 1 results. However, I got the final results as shown in the figure.

The performance of CLLM with Jacobi is much lower.

Do you have any idea where I have made a mistake. Thanks a lot.

Answer 4 · 2024-05-17T05:42:15.000Z

@agentup could you provide some more information about your settings? what hardware are you running on and what command arguments did you use?

Answer 5 · 2024-05-17T05:58:16.000Z

I noticed you are using max_new_tokens= 512, this could be the cause of why you are not getting speedup: you are using a n-token sequence of size 512 for iteration and it could introduce a lot of compute overhead. For GSM8K, please change max_new_tokens to 16 or 32 for a good speedup.

Notice that in this repo, the command arguments has the following meaning:
max_new_tokens: n-token sequence size for Jacobi trajectory generation.
max_seq_len: your total model generation length.

Answer 6 · 2024-05-18T00:49:13.000Z

I noticed you are using max_new_tokens= 512, this could be the cause of why you are not getting speedup: you are using a n-token sequence of size 512 for iteration and it could introduce a lot of compute overhead. For GSM8K, please change max_new_tokens to 16 or 32 for a good speedup.

Notice that in this repo, the command arguments has the following meaning: max_new_tokens: n-token sequence size for Jacobi trajectory generation. max_seq_len: your total model generation length.

Thanks for your help. The problem has been solved.