newhouse/url-tracking-stripper

Additional redirs

Opened this issue · 4 comments

//youtu.be/foo => https://www.youtube.com/watch?v=foo

Also y2u.be works like youtu.be but is not run by google (!) That can't be good

https://redd.it/7tczf9 => https://www.reddit.com/tb/7tczf9

https://app.instapage.com/route/9475232/?url=www.nat.ai/careers => http://www.nat.ai/careers

BTW related but not related, I have a list of ~ 30 domain shorteners plus 3,200 "bit.ly custom shortners" -- these are often used to add tracking cgi args to links that look like they don't have any, but it's hard to do anything useful with them in the context of your extension; you are already just following them and then stripping the args off of the redirect...

Regarding your comment above: yeah, unless there's a way to expand or figure out what the ultimate target is from the shortened link itself, I'm not sure there's anything we can really do without following it first. Are you aware of any way?

Regarding this example https://app.instapage.com/route/9475232/?url=www.nat.ai/careers => http://www.nat.ai/careers:

  • Could you provide some more examples like this? I'm assuming the other ones you provided (youtu.be/foo, redd.it/foo) are as straightforward as they seem, but this one has a path in it, and one with a random-looking integer to-boot.
  • Trying to figure out the pattern. For example, is it https://app.instapage.com/route/<integer>/?url=<target> or https://app.instapage.com/<random_path>/?url=<target> or literally always https://app.instapage.com/route/9475232/?url=<target>...

I'm pretty sure the integer is a customer number.

 https://app.instapage.com/route/4619387/?url=www.sweatflix.com/disclaimer
 https://app.instapage.com/route/7664466/?url=escuelachilenaoratoria.cl/Postgrados_Eneb

Instapage is a small business -- 615 hosts in the top 15 million, I probably should have left them off the list. Now that I look more carefully, more than half of instapage sites only have a single page.

And you are correct, for the real shorteners there's no easy way to find out the link without asking the shortener. I'm thinking of providing a service to disintermediate, just because I (and every other search engine person) hate them with a passion...

iki commented

@newhouse there're also many more redirects in https://github.com/nokeya/direct-links-out. Should I convert them to a PR?