python/docs-community

Shut down and archive https://legacy.python.org/

encukou opened this issue · 15 comments

This came up at the 2021 Language Summit.
“Thomas and Ee to address.”

On a minor sidenote, this came up on the PEPs repo.

Funny enough, this just came up in python/peps#2431 ; @smontanaro updated a link in PEP 339 to a paper of his to point to legacy.python.org, and I can't seem to find an equivalent on python.org or another canonical source. Maybe we need to have a subdirectory or subdomain to archive selected old but still potentially useful material like this that doesn't have a direct replacement on the current site, perhaps with redirects as appropriate?

Not sure what the argument for shutting down the legacy site is. If it's to go away, maybe the webserver can redirect references elsewhere, Wayback Machine perhaps?

It seems Thomas and Ee are @Yhg1s and @ewdurbin. Is that right? Do you know more about this action item?

To me, seeing pages like the mentioned paper it seems like properly sunsetting legacy would be a big pile of work. Is there a plan you know of?

Yeah, I probably have some stuff there too (though I'd be happy to move it to my personal home page).

legacy.python.org is now served from a static mirror. I don't believe taking it offline is prudent.

The apache configuration to serve that mirror is at https://github.com/python/psf-salt/blob/main/salt/hg/config/legacy.apache.conf.jinja, so new redirects could be added to slowly migrate content away.

But for the purposes of this issue (that the host in XS4ALL needed to be decommissioned) I think it can be closed.

Yhg1s commented

Indeed, the host is well and truly shut down: https://twitter.com/Yhg1s/status/1423629453461270533

Thanks for the info! Glad it's handled.

new redirects could be added to slowly migrate content away.

If that's the goal, IMO it would be best to put the archive up as a repo on GitHub (excluding secret/ of course). AFAICS, a PSF staff member would need to release the data. (Scraping won't catch unlinked pages.)

Then anyone interested can look through it, suggest redirects at psf-salt, and remove stuff that has redirects (to keep only “todo items” in main, but everything in the history).
It wouldn't be necessary sync the repo back to the live mirror, until it's empty (if that ever happens).

But it looks like a very low-priority task. I don't think it needs to be tracked. I'll close the issue.

If that's the goal, IMO it would be best to put the archive up as a repo on GitHub (excluding secret/ of course). AFAICS, a PSF staff member would need to release the data. (Scraping won't catch unlinked pages.)

The archive in total is ~1GB... so I'm not sure how feasible it is to make it a repo on GitHub unless people expressly want to do the work.

Are there any directories named "guido" or something like that? I might be interested in hosting those elsewhere.

There's your essays, though they are also found on the modern Python site, and there is your list of presentations, which I can't find on the modern site. Probing various forms/combinations of your name as subdirs didn't appear to reveal anything.

@ewdurbin Could you do me a favor and send me a dump of the "ppt" directory CAM mentioned? (It's served as https://legacy.python.org/doc/essays/ppt/.) I'd like to have those on my personal website.

@gvanrossum that archive has been emailed to you.

The archive in total is ~1GB... so I'm not sure how feasible it is to make it a repo on GitHub unless people expressly want to do the work.

Git and GitHub can handle it (for GH, 1GB is the upper end of the “ideal” range and <5GB is recommended, for Git I've heard similar figures regarding being comfortably fast to use). The only issue I can find is 100MB limit for pushes, so the initial push would need to be split up.

If you send me the archive, I can put it up on GitHub – if only to make any future requests like Guido's self-service-able.