Archive links for auction links
Opened this issue · 5 comments
YAJ listing pages don't last very long (unsure how long eBay ones last), and aucfan/the billion other proxy sites that save YAJ pages aren't guaranteed to keep them.
This could be extended for everything, but most sources (i.e iMp95's site) have been on the internet for ages, so the likelihood of them suddenly disappearing is probably low...
Some japanese twitter users also purge old tweets.
Also closedsearch is an invaluable resource for finding <6 month old YAJ pages.
Yep, agree we need some work here. I'm also a fan of using https://aucview.aucfan.com/yahoo/<auction ID>
to be able to check deleted auctions/removed images, but even that has its limitations.
Some YAJ sellers go the extra step to delete auction images after the listing is over (before the 6 month auto auction delete or whatever) fwiw. I don't exactly have a known list of sellers that do that on hand but I do know cyberdaioo
does it sometimes.
We already talked about a even more general solution of automatically crawling links regularly and persisting the files, e.g. using a github action cronjob that runs once a day.
One option to get this idea started might be to focus on the yahoo auction links first, and explore it with the limited scope. There might be a bunch of useful learnings from that before this can/should be scaled further.
Any updates to this? I am already finding dead links in the Sega boards section.
Yeah! I've been working on a small Python tool called aucscrape
in the meantime. It supports finding Yahoo, eBay and Mercari auction links and retrieving and saving their metadata and media using either the original site or a number of mirror sites. Right now its scraping support is limited to Yahoo auctions, but I'm planning to add eBay and Mercari scraping when I can.
I've already ran it over the repository and stored all Yahoo auctions locally, so rest assured those are safe right now. More soon!