tangqiaoyu/ToolAlpaca

Evaluation Metrics

Closed this issue · 3 comments

Thanks for the great work. I'm wondering how the evaluation metrics in Table 3 are calculated. For example, I have the following evaluation result from GPT-4.

"statistics": { "num": 100, "error_num": 30, "process": { "Yes": 51, "No": 18, "Uncertain": 1 }, "response": { "Yes": 59, "No": 6, "Uncertain": 5 }, "both": 50 }

May I know what denominator should I use when calculating evaluation metrics? 100 or 70?

for example, process_correct_rate = 51/100 or process_correct_rate = 51/70? Kindly let me know. Thanks.

Thank you for your question. When calculating the evaluation metrics, we use 100 as the denominator. This means all instances that were unable to obtain a final process/response are considered as errors.

Thanks for the great work. I'm wondering how the evaluation metrics in Table 3 are calculated. For example, I have the following evaluation result from GPT-4.

"statistics": { "num": 100, "error_num": 30, "process": { "Yes": 51, "No": 18, "Uncertain": 1 }, "response": { "Yes": 59, "No": 6, "Uncertain": 5 }, "both": 50 }

May I know what denominator should I use when calculating evaluation metrics? 100 or 70?

for example, process_correct_rate = 51/100 or process_correct_rate = 51/70? Kindly let me know. Thanks.

How is it determined to be error and how is error_num calculated? I don't see this in the GPT3 evaluation prompt.

Thanks for the great work. I'm wondering how the evaluation metrics in Table 3 are calculated. For example, I have the following evaluation result from GPT-4.
"statistics": { "num": 100, "error_num": 30, "process": { "Yes": 51, "No": 18, "Uncertain": 1 }, "response": { "Yes": 59, "No": 6, "Uncertain": 5 }, "both": 50 }
May I know what denominator should I use when calculating evaluation metrics? 100 or 70?
for example, process_correct_rate = 51/100 or process_correct_rate = 51/70? Kindly let me know. Thanks.

How is it determined to be error and how is error_num calculated? I don't see this in the GPT3 evaluation prompt.

what both mean?