dense-analysis/neural

non english characters mangled in replies from ChatbotGPT

kiil opened this issue · 5 comments

kiil commented

Hi, when using this plugin to ask questions in Danish, the replies from chatGPT are returned with escaped utf-8 sequences / codes instead of the actual Danish characters.

Example reply:

Chatbot teknologi er en computerprogram, der kan simulere en samtale med et menneske. Dette --->  g\u00f8res  <-----
ved at bruge et  --->  s\u00e6t   <---- af algoritmer at analysere samtaler og generere svar, der er relevante for den
samtale, der er i gang.

So in this case \u00f8 should have been 'ø'
and \u00e6 should have been 'æ'.

Is there a way to get the right characters back into the buffer?

@kiil I did some investigation on this a while back, I tried a few things but eventually came to the conclusion that it's not possible to fix this in an elegant way. See for more info neovim/neovim#14281

Neural is in the alpha stage right now and going through a backend rewrite. We will handle decoding UTF-8 characters in python which should hopefully fix this.

kiil commented

Thanks for investigating.

I hope this will (also) get sorted out upstream, eventually. UTF-8 is pretty ubiquitous these days.

w0rp commented

Hello @kiil . Is this still an issue in the current version?

kiil commented

@w0rp This seems to have been fixed. Thank you!

w0rp commented

Great! Python makes handling Unicode much easier, which is one of many reasons why we opted for it.