Download links in the ReadMe are insecure
Closed this issue · 2 comments
Long story short, the download links form tessdata.projectnaptha.com cause some browsers (Vivaldi and Brave, at the very least) to give a warning along the lines of "this file cannot be downloaded securely". Maybe this has to do with outdated security certificates or something? I don't know.
The longer version is that I was trying a browser extension (https://chromewebstore.google.com/detail/ocr-image-to-text-image-r/hoknogdfolliknmnnglffgmfflcdpdih) that uses the trained data to do in-browser OCR. The extension is written so that it downloads the traineddata.gz from the same locations listed in the readme for this very (naptha/tessdata) github project. It first tries the projectnaptha.com links, and then fails over to the github links. Since the projectnaptha.com links get flagged as insecure, this causes problems. Vivaldi handles the failover properly, while Brave does not, making the extension unusable on Brave. Hopefully someone in this project can pass along the issue of insecure downloads to the webmaster, admin, or owner of the projectnaptha.com site.
I think this is an issue for the individual application at issue rather than Tesseract.js or anything here. You should open an Issue with that project.
- The
tessdata.projectnaptha.com
site was used as a default location for language data in older versions of Tesseract.js. This is no longer the default location, and the site is no longer updated.- This was a simple GitHub Pages site, however we are now over the GitHub Pages size limit, so the site no longer updates with new content.
- The site is being left as-is to avoid breaking old code.
- As a result, any project that points to this location is either (1) using an older version of Tesseract.js or (2) setting the language data to that location manually.
- In either case, fixing is under the purview of that project.
Follow-up: I realized that although my above comment is correct, the title of this issue is "Download links in the ReadMe are insecure", and the readme was indeed still linking to the depreciated site. That is definitely an issue. Therefore, I've updated the readme to no longer link to this site, and to better explain where we advise getting the data from.