project-baize/baize-chatbot

train my own data, answers are not very accurate.

yfq512 opened this issue · 1 comments

I collect some chinese data about "**云南" like this:
0417-2
And train follow the readme base on Baize-7B, cost 48 hours, get checkpoints finally.
when I use this checkpoints to run app.py. The AI can speak Chinese, but sometimes it mixes English and Russian, and the answers are not very accurate.
0417

Can you help me analyze the cause? Is it a lack of training data? Or the base model is an English model? Or something else? thanks

Are you using LLaMA as the foundation model? If so, as LLaMA has no Chinese pretraining data, it's not very surprising that the outcome isn't very good. BLOOMZ may be a better foundation model.