nomic-ai/pygpt4all

An error regarding Unicode DecodeError(关于Unicode DecodeError的错误)

funnygeeker opened this issue · 7 comments

Hello, I am using an Alpaca model that supports Chinese, but I often encounter the following error when using PyLLaMACpp:
(您好,我使用了一个支持中文的alpaca模型,但是在使用PyLLaMACpp时经常出现以下错误:)

Traceback (most recent call last):
  File "C:\Users\xxxxxxx\PycharmProjects\pyllamacpp\main.py", line 10, in <module>
    model.generate("你好", n_predict=64, new_text_callback=new_text_callback, n_threads=8, verbose=True)
  File "C:\Users\xxxxxxx\PycharmProjects\pyllamacpp\venv\lib\site-packages\pyllamacpp\model.py", line 112, in generate
    pp.llama_generate(self._ctx, self.gpt_params, self._call_new_text_callback, verbose)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe6 in position 0: unexpected end of data

This model can generally run normally in other similar programs, but it always encounters errors when generating replies using PyLLaMACpp.(这个模型在其他类似程序中基本能正常运行,但是在使用PyLLaMACpp时总是在回复生成到一半时出错。)

The source of the model(模型来源):https://huggingface.co/P01son/ChatLLaMA-zh-7B-int4

Additional console output:

llama_model_load: loading model from 'C:/Users/xxxxxxx/Desktop/ai/llama/chatllama-ggml-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 2052.00 MB per state)
llama_model_load: loading tensors from 'C:/Users/xxxxxxx/Desktop/ai/chatllama-ggml-q4_0.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  512.00 MB
 从前,llama_generate: seed = 1681733560

system_info: n_threads = 8 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 64, n_keep = 0


有一个Traceback (most recent call last):
  File "C:\Users\yangc\PycharmProjects\pyllamacpp\main.py", line 10, in <module>
    model.generate("从前,", n_predict=64, new_text_callback=new_text_callback, n_threads=8, verbose=True)
  File "C:\Users\yangc\PycharmProjects\pyllamacpp\venv\lib\site-packages\pyllamacpp\model.py", line 112, in generate
    pp.llama_generate(self._ctx, self.gpt_params, self._call_new_text_callback, verbose)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 0: unexpected end of data

The process has ended with exit code 1.

I'm confirm the problem on windows and linux with different models. To reproduce the problem, you can also try to translate the word into several unicode-encoded languages.
model.generate('translation "market" to korean, chinese, arabic, spanish languages is:\n', n_predict=1024, new_text_callback=new_text_callback, n_threads=8)
image

Bug fixing the same issue in other projects looks like:

def read_tokens(fin, vocab_size):
    tokens = []
    for _ in range(vocab_size):
        text_len = struct.unpack("i", fin.read(4))[0]
        text_bytes = fin.read(text_len)
        try:
            text = text_bytes.decode("utf-8")
        except UnicodeDecodeError:
            text = text_bytes.decode("utf-8", "replace")
        score = struct.unpack("f", fin.read(4))[0]
        tokens.append((text, score))
    return tokens

But how to apply this to a function pp.llama_generate?

I managed to use the function @Fikavec suggested, seems to work with the original input that caused the UnicodeDecodeError.
Not sure if this is the best way to do it, but i had to
https://pybind11.readthedocs.io/en/latest/advanced/cast/strings.html#returning-c-strings-to-python

  1. Edited src/main.cpp so that it feeds a py::bytes type, instead of converting llama_token_to_str return value to py::str implicitly (this is what probably triggers the UnicodeDecodeError) when calling new_text_callback();
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -447,7 +447,9 @@ int llama_generate(struct llama_context_wrapper * ctx_w, gpt_params params, py::
         if (!input_noecho) {
             for (auto id : embd) {
 //                printf("%s", llama_token_to_str(ctx, id));
-                new_text_callback(llama_token_to_str(ctx, id));
+                   std::string res = llama_token_to_str(ctx, id);
+               py::bytes py_res = py::bytes(res);
+                new_text_callback(py_res);
             }
             fflush(stdout);
         }
  1. Modify the signature of new_text_callback, to be Callable([[bytes], None])
--- a/pyllamacpp/model.py
+++ b/pyllamacpp/model.py
@@ -20,6 +20,8 @@ import logging
 import sys
 import _pyllamacpp as pp
 
+import pdb
+
 
 class Model:
     """
@@ -62,7 +64,7 @@ class Model:
         # gpt params
         self.gpt_params = pp.gpt_params()
 
-        self.res = ""
+        self.res = b""
 
     @staticmethod
     def _set_params(params, kwargs: dict) -> None:
@@ -86,7 +88,7 @@ class Model:
 
     def generate(self, prompt: str,
                  n_predict: int = 128,
-                 new_text_callback: Callable[[str], None] = None,
+                 new_text_callback: Callable[[bytes], None] = None,
                  verbose: bool = False,
                  **gpt_params) -> str:
         """
@@ -105,7 +107,7 @@ class Model:
         self._set_params(self.gpt_params, gpt_params)
 
         # assign new_text_callback
-        self.res = ""
+        self.res = b""
         Model._new_text_callback = new_text_callback
  1. Now we can manually handle ourselves inside the callback the behavior in case of UnicodeDecodeError
from pyllamacpp.model import Model
import pdb
import sys

def new_text_callback(text: bytes):
    #pdb.set_trace()
    new_text = b""
    try:
        new_text = text.decode('utf-8')
    except UnicodeDecodeError:
        new_text = text.decode("utf-8", "replace")
    print(new_text, end="", flush=True)

model = Model(ggml_model='/home/ubuntu/models/gpt4all-lora-quantized-ggjt.bin', n_ctx=512)
model.generate("从前,", n_predict=64, new_text_callback=new_text_callback, n_threads=4, verbose=True)
venv) ubuntu@llama:~/pyllamacpp$ python test.py 
llama_model_load: loading model from '/home/ubuntu/models/gpt4all-lora-quantized-ggjt.bin' - please wait ...
llama_model_load: n_vocab = 32001
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 2052.00 MB per state)
llama_model_load: loading tensors from '/home/ubuntu/models/gpt4all-lora-quantized-ggjt.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  512.00 MB
llama_generate: seed = 1681764454

system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 64, n_keep = 0


 从前,我们认为������的是真正美好,因为���能力了解其���人的需求。然而,现在我们已经发现,这个������还有���密,那就
llama_print_timings:        load time =  3022.36 ms
llama_print_timings:      sample time =    53.13 ms /    64 runs   (    0.83 ms per run)
llama_print_timings: prompt eval time =  1451.87 ms /     5 tokens (  290.37 ms per token)
llama_print_timings:        eval time = 21015.89 ms /    63 runs   (  333.59 ms per run)
llama_print_timings:       total time = 24094.05 ms
(venv) ubuntu@llama:~/pyllamacpp$ 

Thanks @r0psteev It worked like a charm !

Great idea @r0psteev.
can you submit a PR here so I can merge your changes.
Thank you!

yes, sure @abdeladim-s

This should be working now thanks to the amazing contribution of @r0psteev, please free to reopen it if you still have any issues.