jgoerzen/pygopherd

Python 3 Support

Opened this issue ยท 26 comments

Hi John,

I use pygopherd on my server and I really love its ability to serve to both gopher and the web. Such a great idea.

I saw somewhere that you were attempting at one point to update it to python 3. Is that likely to happen? I've also noticed that the package is not in the Debian testing and unstable repositories (at least if I'm reading Debian's package pages correctly). So again, is Python 3 support likely?

Thanks!

Hi, I think I'm going to give this a shot! I plan to fork the repo and implement all of my changes there until I get something workable, and then I'll come back and share.

Some questions for @jgoerzen if you see this:

  • Is this something that you would even be open to merging upstream (if it meets your standard of quality) or should I plan on maintaining my own fork long-term.
  • How important is keeping backwards compatibility with old python versions? Ideally I would drop python 2 completely. I could maybe get it working with python 2.7, but it would be difficult without a compatibility shim like six which seems like a bad idea given this is a zero-dependency package. Anything below 2.7 I'm not really interested in working on.
  • pygopherd was written in a different era and understandably the code style is not "pythonic" by modern standards. How would you feel about running the codebase through a linting tool like black so my IDE doesn't barf all over me. On one hand you would lose some of the legacy of the software, on the other hand it could make it easier to spot bugs.

Hi,

This is fantastic! To answer the questions:

I would happily merge it. However, perhaps better, if you are able to maintain it long-term would be to just designate your fork as the new official one. I haven't had the time to work on it that I used to.

Backwards compatibility with old Python versions is unnecessary.

I have no problem with you doing a linting tool or something.

The one thing I would mention to watch out for is non-UTF8 filenames. Whether you tackle that problem or pretend it doesn't exist is up to you (valid arguments can be made both directions), but it should be noted. The test cases will definitely bring it to your attention :-) And the standard Python Zipfile may make it a bit challenging.

Some of the Zipfile internals are different and the interface in Python 3 doesn't expose some things that were used from Python 2. Mostly this was used for caching. This is probably a pretty obscure feature at this point. I made some progress on it, but kept getting bit by bytes/str.

As a user of the software, this is nice to see.

From my perspective, if the program's functionality is maintained, I don't really care how it works under the hood. Since there's an LTS version of Debian, I think its possible for users to keep on using pygopherd for a few years - so if an alternative is ready within that time span, it would be great.

Great!

The one thing I would mention to watch out for is non-UTF8 filenames. Whether you tackle that problem or pretend it doesn't exist is up to you (valid arguments can be made both directions), but it should be noted. The test cases will definitely bring it to your attention :-) And the standard Python Zipfile may make it a bit challenging.

I ran across some of these filenames when I was playing around with your 2007 gopherspace mirror [0]. I was able to push through it with copious amounts of .decode(errors="surrogateescape") so I'm hoping that's the answer here too.

[0] gopher://mozz.us/1/wayback/2007

Status Update

I have a fork here: https://github.com/michael-lazar/pygopherd

I ran the code through the black formatter and started fixing a bunch of obvious linting and PEP8 errors. Some of these were style-only issues:

  • using == instead of is for None comparisons.
  • using True/False instead of 1/0.
  • etc.

and others uncovered legitimate bugs:

  • using == instead of =.
  • methods with the wrong number of arguments.
  • etc.

After that, I started going through the code and adding type hints wherever possible. These type hints have been insanely helpful to uncover the places where strings or bytes should be used. I've been trying to follow the unicode sandwich technique: decode immediately after reading from the socket/file, and encode again immediately before writing. I am using errors="surrogateescape" and os.fsdecode() to handle non-UTF8 bytes.

The test suite is now passing for me on macOS and Ubuntu ๐ŸŽ‰ . I don't think I'm close to being finished yet though. I'm fairly sure the zip implementation will still crash with non-UTF8 filenames. My plan next is to systematically go through all of the handlers and protocols, test them manually, and add test coverage where I can. My initial burst of steam has died out now so work will likely be slower paced moving forward. It's fun to play around with all of the handlers though.

I really can't stress enough how helpful the type hints have been.

Fantastic! Thank you!

Status Update

I've finished adding test cases for all of the handlers and protocols now (phew!). I'm pretty confident that everything is working now. Not including weird, non-utf8 files for which I still haven't really tested yet. Everything in the testdata/ directory works though. Here are some things worth considering.

Config files

Because the pygopherd config file uses evaluated python code, unfortunately everyone's config is going to break on python 3. In particular, this part of the config is broken:

https://github.com/michael-lazar/pygopherd/blob/c863289f265e077893d63d7a65248f632b6c445f/conf/pygopherd.conf#L146

encoding = mimetypes.encodings_map.items() + \
         {'.bz2' : 'bzip2',
           '.tal': 'tal.TALFileHandler'
          }.items()

needs to be updated to

encoding = list(mimetypes.encodings_map.items()) + \
          list({'.bz2' : 'bzip2',
           '.tal': 'tal.TALFileHandler'
          }.items())

There's also the possibility of custom changes in users' personal configs that we won't be able to detect and fix for them automatically. I think the best thing to do here is communicate that config files will need to be migrated by their owners. The default config has been fixed to work with py3 in my fork.

PYG files

Like above, the pyg file handler will execute python files, so any server using .pyg files will need to migrate their scripts to work with python 3. I didn't bother to update the example file in the pygfarm/ directory which uses the python 2 only dictclient library. But I did write some tests and confirm that the handler will continue to work if you do migrate your files to py3.

mbox files

Mailboxes are working (both mbox files and directories) and I added tests for both. I did change the directory listings to use the built-in mailbox keys to identify emails instead of looping through the generator and keeping track of indexes. For example, instead of

/MAILDIR-MESSAGE/6

pointing to the "sixth" email in the directory, where the order of emails is based on whatever the generator decides, the selector now looks like

/MAILDIR-MESSAGE/1606884253.000005.mbox

I try to preserve the old indexing if you think it's a big deal to break these selectors, but it would be a bit hacky with the new python mailbox library.

simpletal

Miraculously the SimpleTAL library that the tal handler uses has been ported to python 3 and still works. The maintainer tried to upload it to PyPi but they screwed up and the files are missing and you can't actually download it through pip. For this reason, I vendored the dependency directly into my fork in order to get my tests working.

For the debian release we can switch to using python3-simpletal

I would happily merge it. However, perhaps better, if you are able to maintain it long-term would be to just designate your fork as the new official one. I haven't had the time to work on it that I used to.

I've been thinking about this some more and I would be willing to take this on if you'd like. I pretty much consider this project "feature complete" at this point so I'm not going to add new features or try to modernize it anymore. But I can keep it tested and fix things as new python versions come out.

I can work on publishing a new release and getting it up on PyPI. I'm not really interested in maintaining the debian package myself because I don't use debian, but I can work with someone if they want to handle the packaging stuff downstream.

This is all fantastic work! Yes, i am fine with the config file format changing and documenting that. The mbox plans also sound fine; good to keep the old selector working for compatibility with old links.

I could continue maintaining it for Debian (I'm a DD) and would be happy to do that. When you feel good about the quality of it, archive this repo and point it to yours.

Thanks again!

Just an update: I still intend to wrap this project up, but life got busy and I probably won't get around to it until at least January.

Thank you for the update!

Status Update

I'm back!

I finally got around to testing everything with non-UTF8 filenames. This turned out to be a huge pain because I mainly develop on a Macbook, which uses a UTF-8 only APFS filesystem. Even running debian through docker wouldn't allow me to create the files. So I got to dust off my 2010 dell laptop and re-install Ubuntu for old time's sake.

Everything filename related is working to the best of my knowledge now. I tested sending ISO 8859-1 filenames through gopher and http. The zip stuff was pretty interesting. I found your bug report on the python issue tracker. I think I came up with a decent workaround but you might want to try testing that one on your own files.

https://github.com/michael-lazar/pygopherd/blob/6e2f7bb4137f553fe4df2e476ef30a56dceca75d/pygopherd/handlers/ZIP.py#L178

The python3 syslog module only allows writing logs as utf-8, so the above filenames will show up mangled in the syslog. Everything is preserved correctly if you use the file logger though.

I changed the mbox handlers back to using a generator + looping, so old URLs should be preserved now at the cost of some computational efficiency.

I spruced up the README a bit and started putting together some Upgrade Notes and a new Changelog.

Tests are passing for python 3.7 through 3.9.

I haven't touched anything in the debian/ directory because it's out of my wheelhouse. The extra pygfarm package should be dropped IMO unless someone else wants to work on getting that upgraded.

I'm bumping the PyGopherd major version to v3.0.0 which seems like the obvious thing to do.

I uploaded the package to PyPI. Honestly I'm not sure how useful this will be, but I figured it couldn't hurt to have another "official" package source for releases besides the git repo and I'm comfortable maintaining it.

https://pypi.org/project/pygopherd/

When you feel good about the quality of it, archive this repo and point it to yours.

Another option would be to transfer ownership of this repo to me through Github. This (I think) would preserve the existing issues/stars/forks and automatically redirect any existing links. Then I could rebase my branch on top of it. This is totally up to you and I understand the value of keeping a clean separation when a project changes hands. But it's kind of fuzzy now since I'm going to make a new release under the same PyGopherd name (or should I use a new name, Py3Gopherd perhaps?)

What I need next is for some folks who are already running PyGopherd to try out my branch michael-lazar/pygopherd on their servers. If we can get it working on quux.org then it's probably stable enough to cut the release and work on getting it back into debian.

If I wanted to test this on my server (Raspbian 10/buster), would I just install via pip and then make the pygopherd.conf change?

@Shuswap You got it! I just pushed a new beta release to PyPI that's up to date with all of the changes on my branch.

python3 -m pip install pygopherd --pre

This should install v3.0.0b1 into your python site packages folder. Pip will stick the executable somewhere on your filesystem (for me /usr/local/bin/pygopherd) so just make sure you're invoking the new executable and not the old one when you launch your server.

It's up and running at gopher://gopher.visiblink.ca

Thanks so much for doing this! I know it was a fair bit of work.

First, I didn't mean to close the issue. I just wanted to close a comment and try a few other things before asking a question.

Second, if anyone else is interested in testing and you want to daemonize the process -- don't make the same mistake that I did. Don't set up your systemd unit file to run the program as the user gopher. It won't work. Omit the user. As I learned from actually reading the config file (after a lot of frustration!), Pygopherd will set the user and group itself.

Here's a systemd unit file that worked for me, for reference

[Unit]
Description=Pygopherd Server

[Service]
Type=forking
PIDFile=/var/run/pygopherd.pid
Restart=always
RestartSec=5
ExecStart=/usr/local/bin/pygopherd

[Install]
WantedBy=default.target

At first I tried running in Type=Simple mode and letting systemd manage the forking, but the code was failing at the os.setpgrp() line.

Well that's different than mine. This one's working (so far) on Raspbian 10:

[Unit]
Description=Run Pygopherd as a service
After=multi-user.target

[Service]
Type=simple
Restart=always
ExecStart=/usr/local/bin/pygopherd
TimeoutStartSec=0
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

I don't claim to understand it all though.

It seems really stable. Two weeks and no problems.

Knock on wood ;)

This is fantastic news! I hope to test it on quux.org and update the Debian packaging in the next few days.

I pretty much consider this project "feature complete" at this point so I'm not going to add new features or try to modernize it anymore.

So I know I said earlier that I wouldn't add any new features.... but I couldn't help myself ๐Ÿ˜„

I've been working on:

  • Optional TLS support for gopher / https (via looking for the SYN byte of the TLS handshake).
  • Support for the gemini protocol.

https://github.com/michael-lazar/pygopherd/pull/2/files

Oh cool! I'll be curious to see how TLS works on gopher and https (self-signed with trust on first use like gemini or certificate authority).

Thanks for all the work you've done on this Michael. It just gets better and better!

Hello everyone!

At long last, I have:

  • Packaged up this version for Debian and uploaded it
  • Tested it on quux.org

I found only one minor change from the previous, and I'm honestly not quite sure which was correct off the top of my head. If, in a .names file, there was a line saying Name= all by itself, Pygopherd2 used the filename as the display name, while Pygopherd3 made it empty.

It was readily enough addressed with:

find . -name .names -exec sed -i 's/^Name=$//g' {} +

My Debian packaging has added a systemd unit file which is different from both of the examples here. By using detach=no in pygopherd.conf, we can get a much easier semantic with systemd.

I will follow up with a PR to your repo that makes those minor tweaks to the conf file, includes an example systemd unit file for non-Debian users, and updates the documentation in doc/.

Question:

I assume that going forward, you would like to use your repo as the canonical location for pygopherd? That is fine by me; in that case, I will update the README here to point to yours and close other things out in this one.

Cool! I merged your PR and bumped the official version number to v3.0.0 ๐ŸŽ‰. I'm glad to see this is making its way back into Debian.

I assume that going forward, you would like to use your repo as the canonical location for pygopherd? That is fine by me; in that case, I will update the README here to point to yours and close other things out in this one.

Yea that's good with me, I have no plans of ever taking it down and will do my best to keep it up to date with new python versions, etc. I also invited you as a project collaborator if you would like to have maintainer access.