LinkInfo: URL title matching overzealous in some cases (e.g. G+)

Question

LinkInfo: URL title matching overzealous in some cases (e.g. G+)

Closed this issue 12 years ago · 0 comments

A URL like https://plus.google.com/+LinusTorvalds/posts/ByVPmsSeSEG has a component that matches the beginning of the title, because G+ titles have the person/page name first. Maybe there should be some kind of percentage match threshold below which the URL component is not considered a good substitute for the title?

[2012/10/31 17:14:01] (csbot.pretty_log) [#cs-york-dev] <Alan> https://plus.google.com/+LinusTorvalds/posts/ByVPmsSeSEG
[2012/10/31 17:14:01] (requests.packages.urllib3.connectionpool) Starting new HTTPS connection (1): plus.google.com
[2012/10/31 17:14:02] (requests.packages.urllib3.connectionpool) "GET /+LinusTorvalds/posts/ByVPmsSeSEG HTTP/1.1" 200 None
[2012/10/31 17:14:02] (csbot.plugins.linkinfo) path part "linustorvalds" matches title "linustorvaldsgooglesowithevenatabletdoingxpixeldisplays"
[2012/10/31 17:14:02] (csbot.plugins.linkinfo) URL not handled: https://plus.google.com/+LinusTorvalds/posts/ByVPmsSeSEG