generate <unk><unk><unk><unk><unk><unk><unk><unk>
louwangzhiyuY opened this issue · 4 comments
python console_chat.py
Loading checkpoint shards: 100%|████████████████████████████████████████| 2/2 [02:37<00:00, 78.94s/it]
Number of GPUs available: 1
Model ../model-cache/mistralai/Mistral-7B loaded successfully on cuda
Enter your text (type #end to stop): What is captial of canada?
### Text: What is captial of canada?
The tone is:
If the console chat
is being used the answer is totally different than gradio chat
:
Number of GPUs available: 1
Model ../model-cache/mistralai/Mistral-7B loaded successfully on cuda
Enter your text (type #end to stop): What is capital of canada?
<s> ### Text: What is capital of canada?
### The tone is:
surprise </s>
Enter your text (type #end to stop):
and in browser:
Both get the info from Model ../model-cache/mistralai/Mistral-7B
. Why the different answers!?
Hi @elsaco I cannot reproduce your results, got same inferencing results from both console and gradio.
It seems to me that your gradio demo only runs the base model.
Could you check if the adapter does exist after fine-tuning and is loaded correctly from line 22 - 41 in gradio_chat.py
?
Ok, so the E2E fine-tuning and inferencing workflow should work in your setup.
The inferencing result may be random because the adapter is trained on a small, toy dataset for demonstration purpose only.