Instruction-Tuning-with-GPT-4/GPT-4-LLM

The quality of comparision_data.json seems not good

xiangrongzeng opened this issue · 1 comments

I'm exploring the comparision_data.json dataset and found that the completion_a is not always good than completion_b (often not). According to the introduction of comparision_data.json, completion_a is ranked higher.
Is there anyone feels the same ?

Here is an example, which is the second instance in this dataset. I think completion_b is better than completion_a.

{
"user_input": "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat are the three primary colors?\n\n### Response:",
"completion_a": "The three primary colors are red, blue, and yellow.",
"completion_b": "The three primary colors are red, blue, and yellow. These colors are called primary because they cannot be created by mixing other colors and all other colors can be made by combining them in various proportions. In the additive color system, used for light, the primary colors are red, green, and blue (RGB)."
}

updated the comparion data. Please check it out.