daknob/TorPaste

Properly address UTF-8 Support

daknob opened this issue · 11 comments

Currently there's a hack in Python in order to have Flask support UTF-8 (or so I think). This particular hack should be replaced with a proper way of using UTF-8 as the default Python2 encoding instead of ASCII.

Python 3 addresses this issue by being UTF-8 by default, however it is not clear whether Flask will work with Python 3.

Since a code commit (046220f) fixed the respective issue (#18), I have opened this one to track the progress of the unhackiness.

j11e commented

I will look into it :)
Just a question: if it turns out that Flask can work with Python 3, is a switch worth considering? Or is keeping Python 2.7 the priority?

Thanks a lot, that's something important and didn't stumble upon this Flask + Python 2 Limitation before. According to the Flask documentation, Flask can in theory run in Python 3(.3+). However, if we are to consider moving to Python 3, we need to see if we have to port some code to Python 3, and then make sure all dependencies work. Currently the dependencies are Flask and Gunicorn, however in the future, especially due to the custom backends, these dependencies may increase. This is the only reason I would not recommend a transition right now.

Some potential modules for backends can be requests, python-cassandra/cassandra-driver, mysqldb, etc.

On the other hand, a quick search on the web did not yield many results for Python 2. The only two working methods (which are viable, rewriting Flask is not viable) are this method as well as a more hacky solution which basically does the same thing, however it does it inside the script.

j11e commented

As you can see in the PR, I found a way to properly handle UTF-8 that does not require switching to Python 3!

As an aside, I think switching wouldn't hurt - most libraries are now perfectly compatible with Python3 (as you can see on http://py3readiness.org/ if you didn't know). Also, if Flask can work in both Python2.7 and Python3.3+ with the same code, the same is probably feasible for TorPaste. But there are clearly other issues with a higher priority right now.

Thanks for the website, I didn't know about it.. I used to manually check the docs for every framework beforehand. Apparently Flask and gunicorn, the only important dependencies used currently have a green tick. Should we create a branch python3 and try everything there, and then merge if everything is working fine, after some time for testing and development?

j11e commented

Yes, I was thinking on doing that on my fork, but if you do it directly in upstream, all the better.

There will be some changes to be done, though; when I try to run TorPaste as-is with Python3, I had an exception right from the start (since standard modules have been renamed or moved around between Python2 and 3).

I can do it right now in upstream and then you can send PRs from your python3 branch to my python3 branch. When we're satisfied, we can merge my python3 to my master, and then you can pull that to your repo. How does that sound?

j11e commented

That sounds perfect!

The python3 branch has been pushed to this repository. You can pull it and make changes there, then send PRs. PRs will be merged even if python3 isn't ready for merging yet, so feel free to open more than one. However, we should make sure that each and every PR, when merged, successfully runs using docker. For this reason, I would recommend testing them with the command I sent you before and then proposing them for merge (removing `[WIP]).

j11e commented

Great. Should we create an associated "enhancement" issue?

Feel free to create it and I will add all the needed labels.. :-) Don't forget to mention that it's being developed in your fork as well, or make sure to open PRs even for Work in Progress!

j11e commented

I just submitted issue #32 for this. :)