simonw/llm

UTF8 Surrogates Not Allowed

Opened this issue · 1 comments

Something in the text returned from GPT 4o can't be logged to the database.

File "C:\tools\hudson\Lib\site-packages\sqlite_utils\db.py", line 3310, in insert_all
self.insert_chunk(
File "C:\tools\hudson\Lib\site-packages\sqlite_utils\db.py", line 3068, in insert_chunk
result = self.db.execute(query, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tools\hudson\Lib\site-packages\sqlite_utils\db.py", line 524, in execute
return self.conn.execute(sql, parameters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc81' in position 14511: surrogates not allowed

Work around: Disable logs and run the prompt again.

PS> cat .\transcript.csv | llm -m 4o -s "Extract each place name."

Would be nice to have a small reproducer file.