tesseract-ocr/langdata

Missing Norwegian special characters in desired_characters file

Andrioden opened this issue · 1 comments

The file langdata/nor/desired_characters does not contain "Ø" and "Å" which is 2 of the 3 special characters in the Norwegian language. It seems intuitively that these should be added as well like "Æ" was because of #36.

I also want to point out the fact that "Ä", "É", "Ö" is added to the desired_characters file when these characters has nothing to do with the Norwegian alphabet and is not used in Norwegian (unless quoting Swedish papers or such).

Disclaimer: I have not tested anything related to this, only stumbled upon it and wanted to notify you.

This is still an issue.