On Colab, UnicodeDecodeError
LeMoussel opened this issue · 9 comments
Hello, I am using pygtp4all
on Colab, but I often encounter the following error:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
[<ipython-input-2-1545ea4aa0da>](https://localhost:8080/#) in <cell line: 8>()
6
7 model = GPT4All_J('./ggml-gpt4all-j-v1.2-jazzy.bin')
----> 8 model.generate("Once upon a time, ", n_predict=55, new_text_callback=new_text_callback)
[/usr/local/lib/python3.9/dist-packages/pygptj/model.py](https://localhost:8080/#) in generate(self, prompt, new_text_callback, n_predict, seed, n_threads, top_k, top_p, temp, n_batch)
122
123 # run the prediction
--> 124 pp.gptj_generate(self.gpt_params, self._model, self._vocab, self._call_new_text_callback)
125 return self.res
126
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 0: invalid start byte
Colab soure code:
!pip install -q pygpt4all
!wget https://gpt4all.io/models/ggml-gpt4all-j-v1.2-jazzy.bin
# https://github.com/nomic-ai/pygpt4all
from pygpt4all.models.gpt4all_j import GPT4All_J
def new_text_callback(text):
print(text, end="", flush=True)
model = GPT4All_J('./ggml-gpt4all-j-v1.2-jazzy.bin')
model.generate("Once upon a time, ", n_predict=55, new_text_callback=new_text_callback)
I am getting this issue as well on Ubuntu 22.04 VirtualBox VM, I am just trying to run the code example in README file
Also getting this on Ubuntu 20,04 (WSL2) using python 3.11.2. Tried using a fresh venv and installing below but still same problem.
pygpt4all 1.0.1
pygptj 1.0.5
im experiencing this as well on Ubuntu 22.04, python 3.10.9.
i too was just trying to run the code example from the readme. i experienced some additional errors with unknown tokens as well (see the following additional output):
---------------------------------------------------------------------------
model.generate("Once upon a time, ", n_predict=55, new_text_callback=new_text_callback)
gptj_generate: seed = 1682362796
gptj_generate: number of tokens in prompt = 3
gpt_tokenize: unknown token ' '
gpt_tokenize: unknown token 'a'
gpt_tokenize: unknown token 'm'
gpt_tokenize: unknown token 'e'
gpt_tokenize: unknown token ','
gpt_tokenize: unknown token ' '
Once upon tiTraceback (most recent call last):
File "/opt/python/mambaforge/envs/cbot4me/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3505, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-6-0c9a6b32721a>", line 1, in <module>
model.generate("Once upon a time, ", n_predict=55, new_text_callback=new_text_callback)
File "/opt/python/mambaforge/envs/cbot4me/lib/python3.10/site-packages/pygptj/model.py", line 124, in generate
pp.gptj_generate(self.gpt_params, self._model, self._vocab, self._call_new_text_callback)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 0: invalid start byte
separately, i did try to apply this update mentioned in the comment relating to the similar (but different) issue #61. i actually did get it to successfully generate some text, but it ended up failing again after generation of just a few tokens.
Thanks @LeMoussel for reporting the issue.
I just tried it on Colab and you are right, I don't know why building it from source does not raise this error! Maybe something related to the pre-built wheels process!
I pushed a new version of pygptj
v1.0.8
to handle the error as @cbrendanprice mentioned, but it still not working as expected.
from pygptj.model import Model
def new_text_callback(text):
print(text, end="")
model = Model('/home/su/Downloads/ggml-gpt4all-j.bin', log_level=logging.ERROR)
model.generate("Once upon a time, ", n_predict=55, new_text_callback=new_text_callback)
I tried a lot but couldn't figure out yet why the prebuilt wheels are not working properly!
Please let me know if you have any idea!
Same error for me (Ubuntu 22.04 running on WSL2), tried with both pygptj
and pygpt4all
.
Same problem on Ubuntu 20.04
model = GPT4All_J(model_path='./models/ggml-gpt4all-j-v1.3-groovy.bin', log_level=logging.ERROR)
model.generate(prompt, n_predict=50, new_text_callback=new_text_callback)
gptj_generate: seed = 1682513212
gpt_tokenize: unknown token ' '
gpt_tokenize: unknown token 'a'
gpt_tokenize: unknown token 'm'
gpt_tokenize: unknown token 'e'
gpt_tokenize: unknown token ','
gpt_tokenize: unknown token ' '
gptj_generate: number of tokens in prompt = 3
Hello guys,
Please update the package and give it a try now ?
I believe the issue should be solved.
(ensure you have pygptj
version 1.0.10
).
It works for me. Thx!
HI, It's OK with pygptj
version 1.0.10