Why the trained model does not produce the answer provided in the training data?

Question

Why the trained model does not produce the answer provided in the training data?

Opened this issue 3 months ago · 0 comments

I followed the format C-RLFT provided in readme, and created this data:
{"items":[{"role":"user","content":"Where is PingHu?","weight":0.0},{"role":"assistant","content":"PingHu is near Shanghai.","weight":1.0}],"condition":"GPT4","system":""}
{"items":[{"role":"user","content":"Where is PingHu?","weight":0.0},{"role":"assistant","content":"I don't know.","weight":0.1}],"condition":"GPT3","system":""}

I trained the model for 20 epochs and saved every 2, but all the models that I tested all produced the answer from
the original model (imone/Mistral_7B_with_EOT_token). No trained model gave the expected answer: PingHu is near Shanghai.