Add sjl-static domains for thumbnails
Opened this issue · 6 comments
In 2006-2007, YouTube used sjl-static{number}.sjl.youtube.com to host thumbnails.
https://web.archive.org/cdx/search/cdx?url=sjl-static1.sjl.youtube.com/*&output=json&fl=original&collapse=urlkey
Only issue is that the number in question goes from 1 - 16, meaning there needs to be 16 domains checked and that's pretty unrealistic for every query.
Does the CDX API support wildcards in the hostname?
Looks like not directly, but it does support regex. I wonder if that can be used.
I think a good approach would be grabbing all the links and request a search from that since YouTube doesn't use it anymore
I don't think that's a good idea as WARCs may always be added to the Wayback Machine. We'd be missing those.
True....
What do you think about filtering for all subdomains of sjl.youtube.com, i.e. https://web.archive.org/cdx/search/cdx?url=*.sjl.youtube.com/*&output=json&fl=original&collapse=urlkey ?
Edit: Ah, I see, you can't filter for all subdomains and a specific prefix simultaneously. :/