Evaluation Metrics
Closed this issue · 3 comments
Thanks for the great work. I'm wondering how the evaluation metrics in Table 3 are calculated. For example, I have the following evaluation result from GPT-4.
"statistics": { "num": 100, "error_num": 30, "process": { "Yes": 51, "No": 18, "Uncertain": 1 }, "response": { "Yes": 59, "No": 6, "Uncertain": 5 }, "both": 50 }
May I know what denominator should I use when calculating evaluation metrics? 100 or 70?
for example, process_correct_rate = 51/100 or process_correct_rate = 51/70? Kindly let me know. Thanks.
Thank you for your question. When calculating the evaluation metrics, we use 100 as the denominator. This means all instances that were unable to obtain a final process/response are considered as errors.
Thanks for the great work. I'm wondering how the evaluation metrics in Table 3 are calculated. For example, I have the following evaluation result from GPT-4.
"statistics": { "num": 100, "error_num": 30, "process": { "Yes": 51, "No": 18, "Uncertain": 1 }, "response": { "Yes": 59, "No": 6, "Uncertain": 5 }, "both": 50 }
May I know what denominator should I use when calculating evaluation metrics? 100 or 70?
for example, process_correct_rate = 51/100 or process_correct_rate = 51/70? Kindly let me know. Thanks.
How is it determined to be error and how is error_num calculated? I don't see this in the GPT3 evaluation prompt.
Thanks for the great work. I'm wondering how the evaluation metrics in Table 3 are calculated. For example, I have the following evaluation result from GPT-4.
"statistics": { "num": 100, "error_num": 30, "process": { "Yes": 51, "No": 18, "Uncertain": 1 }, "response": { "Yes": 59, "No": 6, "Uncertain": 5 }, "both": 50 }
May I know what denominator should I use when calculating evaluation metrics? 100 or 70?
for example, process_correct_rate = 51/100 or process_correct_rate = 51/70? Kindly let me know. Thanks.How is it determined to be error and how is error_num calculated? I don't see this in the GPT3 evaluation prompt.
what both mean?