LineReader stops reading when it hits a character like "É" or "ñ"
Opened this issue · 11 comments
So you have a textfile such as:
diner
restaurant
lunch-spot
greasy spoon
café // "é" character
coffee shop
cafeteria
LineReader stops reading when it hits the "café" line above. Never gets to "coffee shop".
Maybe the file is not encoded using UTF-8? I use NSUTF8StringEncoding
in the FileReader. See (NSString*)readLine
in line 72. Maybe you can find a way to discover the encoding type of the file before you start reading its content. You are welcome to fork the project.
Hi, i still have this problems
Have you verified which character encoding is used by the file you are trying to read?
Hi, it's Unicode (UTF-8)
Could you can upload a zipped sample somewhere? Then I will find the time to take a look at it in a few days.
I think you can create new document with some character like í, é, ñ ..... Or i will update some sample data
I think you should really upload an example file somewhere. I can write an ñ
both into an ASCII or UTF-8 encoded file.
You can also find out yourself about the character encoding used in the file with an editor. If you are using Windows I recommend Notepad++. On MacOSX or Linux run the following command in a shell: $ file filename
.
This is file's info: Non-ISO extended-ASCII English text, with very long lines, with CRLF line terminators.
This is the file: http://www.mediafire.com/?1cwr4if28w504md
It have "î" character
Agreed. As I suspected the file is not encoded as UTF-8.
I converted the file to UTF-8 using Notepad++ (options are visible in the menu) so you can try again with this file.
Maybe we must automatically convert all file to UTF-8 before start reading its content
I suggest that you look for a way to recognize the character encoding in front. Feel free to add it to the LineReader.