vhf/confusable_homoglyphs

Library gives fatal error when unable to contact unicode.org

Closed this issue · 6 comments

If utils.get() is run too many times, unicode.org may throttle or even blacklist the server. In this case, categories.py fails with the error Datafile not found, datafile generation failed! The timeout on unicode.org is reported when the file is run from the command line, but not when it is called from django-registration.

I'm open to various ways of avoiding the timeout issue in the first place. Two ideas are allowing users to specify a path to cached copies of categories.json and confusables.json (the release on PyPI does not include these files), or to a mirror copy of those files on a different server.

Regardless, I suggest that utils.get() raise an error if it is unable to contact unicode.org, so that the errors are more informative.

vhf commented

Thanks!

Agreed on raising, that'd be a first step.

I think we shouldn't provide these files in the PyPI package. A mirror copy is doable though, we could host it here on the github repo.

IIRC you mentioned on Twitter that the issue came from the package downloading the files and then not finding them again because their destination wasn't correct? I like the idea of being able to specify the path and I wouldn't be surprised if the default path we used here was faulty.

Sorry, I wasn't clear. I was repeatedly deploying Django to an Elastic Beanstalk environment, and each time it would delete the json files and then re-create them. I worked around that problem by including them in my repo for upload (not for distribution). They were downloaded to the root of the application directory.

vhf commented

I'm not familiar with Elastic Beanstalk. Not sure why it would delete and redownload the json files every time.

What do you think is the best solution:

  • adding a parameter to specify where the lib should locally find the JSON
  • or automatically putting the files somewhere else?

EB just deletes everything and reinstalls from a Git repo, and the strategy recommended in this tutorial is to reinstall everything from pip, so the json files are deleted. My workaround was to put them in the working directory of the repo, but it would be better if we had your first option, being able to specify where the files are locally cached.

Hello. I've opened #6, which should address this to a degree. In short, the data files would be included in the distribution, although a simple CLI would allow users to download and generate an up-to-date version of the files.

vhf commented

This has been fixed, please upgrade to 3.0.0, here is the changelog: https://confusable-homoglyphs.readthedocs.io/en/latest/history.html#id4