THUDM/ChatGLM-6B

微调以后同一个checkpoint在evaluate模式和部署模式下,同一份验证集的效果相差非常大

BirderEric opened this issue · 0 comments

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

微调以后同一份验证集和同一个checkpoint,通过evalute脚本predict出来的结果跟通过部署方式predict出来的结果相差非常大,准确率分别为83%,39%,有大佬们遇到相同的情况吗?

Expected Behavior

No response

Steps To Reproduce

  1. ptuning with my own train.json
  2. predict dev.json with evaluate.sh using checpoint
  3. predict dev.json with model.chat function using the same chpoint
  4. different result and precision

Environment

- OS: Ubuntu 20.04
- Python:3.9
- Transformers:4.33.1
- PyTorch:2.0.1
- CUDA Support:true

Anything else?

No response