Incomplete Response from 4bit Version of PhoGPT
DavidMediaX opened this issue · 1 comments
Hello, I made some testing on 4bit and 8bit version of PhoGPT. I got issue with 4bit version detail is below:
Environment:
PhoGPT Version: 4bit
Execution Environment: Google Colab with T4 GPU
Issue Description:
When using the 4bit version of PhoGPT with the provided initialization code from the documentation, the model returns an incomplete response. Specifically, it only returns a newline character \n, in contrast to the 8bit version, which functions correctly and returns a comprehensive output.
Steps to Reproduce:
Initialize the 4bit PhoGPT model using the sample code from the official documentation.
Use instruction = "Viết bài văn nghị luận xã hội về an toàn giao thông"
Observe that the response is only a newline character, indicating an incomplete or failed generation.
Expected Behavior:
The 4bit version of PhoGPT should return a complete and coherent response similar to the 8bit version, which returns detailed and lengthy outputs.
Actual Behavior:
The 4bit version outputs only a newline character \n, indicating an error or issue in processing the input prompt.
It might be because of the change in the recent Transformers library.
Can you try the example from: https://huggingface.co/docs/transformers/main/en/quantization#4-bit with PhoGPT?
We recently released 4- and 8-bit variants of PhoGPT with llama.cpp. You might want to try that too.