Evaluation Metrics

Question

Evaluation Metrics

Closed this issue a year ago · 3 comments

mattgithub1919 commented a year ago

Thanks for the great work. I'm wondering how the evaluation metrics in Table 3 are calculated. For example, I have the following evaluation result from GPT-4.

"statistics": { "num": 100, "error_num": 30, "process": { "Yes": 51, "No": 18, "Uncertain": 1 }, "response": { "Yes": 59, "No": 6, "Uncertain": 5 }, "both": 50 }

May I know what denominator should I use when calculating evaluation metrics? 100 or 70?

for example, process_correct_rate = 51/100 or process_correct_rate = 51/70? Kindly let me know. Thanks.

Answer 1 · 2023-10-14T09:10:52.000Z

Thank you for your question. When calculating the evaluation metrics, we use 100 as the denominator. This means all instances that were unable to obtain a final process/response are considered as errors.

Answer 2 · 2023-11-13T06:34:10.000Z

Thanks for the great work. I'm wondering how the evaluation metrics in Table 3 are calculated. For example, I have the following evaluation result from GPT-4.

"statistics": { "num": 100, "error_num": 30, "process": { "Yes": 51, "No": 18, "Uncertain": 1 }, "response": { "Yes": 59, "No": 6, "Uncertain": 5 }, "both": 50 }

May I know what denominator should I use when calculating evaluation metrics? 100 or 70?

for example, process_correct_rate = 51/100 or process_correct_rate = 51/70? Kindly let me know. Thanks.

How is it determined to be error and how is error_num calculated? I don't see this in the GPT3 evaluation prompt.

Answer 3 · 2023-11-13T06:35:11.000Z

Thanks for the great work. I'm wondering how the evaluation metrics in Table 3 are calculated. For example, I have the following evaluation result from GPT-4.
"statistics": { "num": 100, "error_num": 30, "process": { "Yes": 51, "No": 18, "Uncertain": 1 }, "response": { "Yes": 59, "No": 6, "Uncertain": 5 }, "both": 50 }
May I know what denominator should I use when calculating evaluation metrics? 100 or 70?
for example, process_correct_rate = 51/100 or process_correct_rate = 51/70? Kindly let me know. Thanks.

How is it determined to be error and how is error_num calculated? I don't see this in the GPT3 evaluation prompt.

what both mean?