UB-Mannheim/tesseract

Languages should be sorted

THausherr opened this issue · 6 comments

Environment

  • Tesseract Version: v5.0.1.20220107
  • Platform: W10.0.19043.1645 64 bit

Current Behavior:

grafik

Expected Behavior:

Languages should be sorted

Suggested Fix:

Sort

I think they are sorted by filename (deu.traineddata for German).

Ideally the list should use localized names ("Deutsch" for users who selected the German user interface) and sort those localized names. Do you want to implement that and send us a pull request?

Sorry, no, not enough time, sadly.

Nor do I have enough time. Maybe someone else has an idea how this can be done with reasonable efforts.

@stweil I would like to TRY and give this a go. But I am looking for the files / list in the repo but does not seem to find it. Where is the code that generates the installer package?

Can you please give some pointers if possible?

filak commented

The language selection is sorted by the Tesseract language code, but only the description is being displayed so it looks messy.

https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc#languages-and-scripts

 cos (Corsican), cym (Welsh), dan (Danish), deu (German), div (Dhivehi)

https://github.com/UB-Mannheim/tesseract/blob/windows/nsis/tesseract.nsi

 Section /o "German" SecLang_deu

Maybe a quick fix would do ?

 Section /o "deu - German" SecLang_deu