eikek/docspell

Add more languages

tiborrr opened this issue · 5 comments

I currently use the following languages in another application . I think it would be good to add those because then we cover the entirety of Europe. I have invoices from all of Europe. Would like to know your thoughts on this. Perhaps we should think of a way to dynamically add more languages as seen fit by the user as for a default this would increase the image with about 14 * approximately 8 MB = 112 MB (installed size)

+   tesseract-ocr-data-bel \
+   tesseract-ocr-data-bos \
+   tesseract-ocr-data-bul \
    tesseract-ocr-data-ces \
    tesseract-ocr-data-dan \
    tesseract-ocr-data-deu \
+   tesseract-ocr-data-ell \
    tesseract-ocr-data-eng \
    tesseract-ocr-data-est \
    tesseract-ocr-data-fin \
    tesseract-ocr-data-fra \
    tesseract-ocr-data-heb \
+   tesseract-ocr-data-hrv \
+   tesseract-ocr-data-hun \
+   tesseract-ocr-data-isl \
    tesseract-ocr-data-ita \
    tesseract-ocr-data-jpn \
+   tesseract-ocr-data-kat \
    tesseract-ocr-data-lav \
    tesseract-ocr-data-lit \
+   tesseract-ocr-data-ltz \
+   tesseract-ocr-data-mkd \
+   tesseract-ocr-data-mlt \
    tesseract-ocr-data-nld \
    tesseract-ocr-data-nor \
    tesseract-ocr-data-pol \
    tesseract-ocr-data-por \
    tesseract-ocr-data-ron \
    tesseract-ocr-data-rus \
    tesseract-ocr-data-slk \
+   tesseract-ocr-data-slv \
    tesseract-ocr-data-spa \
+   tesseract-ocr-data-srp \
    tesseract-ocr-data-swe \
+   tesseract-ocr-data-tur \
    tesseract-ocr-data-ukr \