Incomplete Response from 4bit Version of PhoGPT

Question

Incomplete Response from 4bit Version of PhoGPT

DavidMediaX opened this issue 6 months ago · 1 comments

Hello, I made some testing on 4bit and 8bit version of PhoGPT. I got issue with 4bit version detail is below:

Environment:
PhoGPT Version: 4bit
Execution Environment: Google Colab with T4 GPU

Issue Description:
When using the 4bit version of PhoGPT with the provided initialization code from the documentation, the model returns an incomplete response. Specifically, it only returns a newline character \n, in contrast to the 8bit version, which functions correctly and returns a comprehensive output.

Steps to Reproduce:
Initialize the 4bit PhoGPT model using the sample code from the official documentation.
Use instruction = "Viết bài văn nghị luận xã hội về an toàn giao thông"
Observe that the response is only a newline character, indicating an incomplete or failed generation.

Expected Behavior:
The 4bit version of PhoGPT should return a complete and coherent response similar to the 8bit version, which returns detailed and lengthy outputs.

Actual Behavior:
The 4bit version outputs only a newline character \n, indicating an error or issue in processing the input prompt.

8bit

4bit

Answer 1 · 2024-04-03T07:52:02.000Z

It might be because of the change in the recent Transformers library.
Can you try the example from: https://huggingface.co/docs/transformers/main/en/quantization#4-bit with PhoGPT?

We recently released 4- and 8-bit variants of PhoGPT with llama.cpp. You might want to try that too.