microsoft/vscode-ai-toolkit

generate <unk><unk><unk><unk><unk><unk><unk><unk>

louwangzhiyuY opened this issue · 4 comments

python console_chat.py
Loading checkpoint shards: 100%|████████████████████████████████████████| 2/2 [02:37<00:00, 78.94s/it]
Number of GPUs available: 1
Model ../model-cache/mistralai/Mistral-7B loaded successfully on cuda
Enter your text (type #end to stop): What is captial of canada?
### Text: What is captial of canada?

The tone is:

If the console chat is being used the answer is totally different than gradio chat:

Number of GPUs available: 1
Model ../model-cache/mistralai/Mistral-7B loaded successfully on cuda
Enter your text (type #end to stop): What is capital of canada?
<s> ### Text: What is capital of canada?
### The tone is:
surprise </s>
Enter your text (type #end to stop):

and in browser:

capital_of_canada

Both get the info from Model ../model-cache/mistralai/Mistral-7B. Why the different answers!?

Hi @elsaco I cannot reproduce your results, got same inferencing results from both console and gradio.

image
image

It seems to me that your gradio demo only runs the base model.
Could you check if the adapter does exist after fine-tuning and is loaded correctly from line 22 - 41 in gradio_chat.py?

image

The adapter exist but all is returned is surprise

Output after mistral-7b finetuning:

gradio-chat

The output is the same using phi-2 model, so it might be a gradio-chat issue.

Ok, so the E2E fine-tuning and inferencing workflow should work in your setup.

The inferencing result may be random because the adapter is trained on a small, toy dataset for demonstration purpose only.