UTF-8 issues with invalid byte sequences.

Question

UTF-8 issues with invalid byte sequences.

Closed this issue 9 years ago · 4 comments

Some authors with umlaut letters in their names can get author details in git blame containing invalid UTF-8, specifically if they're on strange systems like windows.
By doing line.encode! :invalid=>:replace before trying to match strings in get_blame, I've personally managed to get around that issue, but that might not be a useful fix for everyone. Anyhow, just a heads up. :)

Answer 1 · 2016-04-10T18:44:00.000Z

It works fine on my Linux system. Have you tried to set LC_ALL in Windows?

Answer 2 · 2016-04-11T14:44:33.000Z

Right, I was probably being a little unclear. I'm running this under bash on a Mac, but other authors use windows machines where I cannot control the written encoding. This makes your script not crash. :)

Answer 3 · 2016-05-13T18:34:22.000Z

I see the issue now. By default Git uses UTF-8. Fixed in 62cb4df.

Answer 4 · 2016-05-16T09:01:32.000Z

Thanks @felipec !