arthurpsmith/wikidata-tools

Security issue: the tool can be used to bypass spam filters

wetneb opened this issue · 6 comments

Because the URL prefix can be passed as a parameter, this makes it possible to redirect to arbitrary websites. For instance, it is possible to bypass the (very annoying) ban of TinyURL on Wikidata, by using this link:
http://tools.wmflabs.org/wikidata-externalid-url/?p=213&url_prefix=https://tinyurl.com/&id=hb9897q

Now in the case of TinyURL I think that's more of a feature than a bug! (We could make a small template that exploits that, so that we could include links to queries via {{tinyurl|hb9897q}}.) But I suspect this violates some policy.

It is fairly simple to solve this: instead of reading url_prefix from the query, just store the mapping from property id to url prefix, and use that to retrieve the prefix.

That's not as simple as it sounds - the tool is used for more than the listed collection of properties because it also fixes some URL-encoding issues that run into trouble with the standard Wikidata UI. Better might be to have it look up the blacklist and prevent redirects to blacklist sites - any suggestion how to do that?

I would just enforce enforce that the tool is only used for the listed properties, I think.

It's already being used by more than the listed ones - 25 total according to this query:
https://query.wikidata.org/#select%20%3Fprop%20%3FpropLabel%20%3Fformatter%20WHERE%20%7B%0A%20%20%3Fprop%20wdt%3AP1630%20%3Fformatter%20.%0A%20%20FILTER%28contains%28str%28%3Fformatter%29%2C%20%27wikidata-externalid-url%27%29%29%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%20%20%7D

which is about double what is listed. It's also rather constraining - it means if an id location changes the only way to fix the links is with a code change. I'll look into adding a blacklist.

Wow - is this the list? https://meta.wikimedia.org/wiki/Spam_blacklist - it's enormous! Almost 10,000 entries. Hmm.

A possible solution would be for all requests using url_prefix to check the resulting URI against the blacklist using the MediaWiki API, redirecting only if the check passes.

Alternately, the check might be even offloaded to the client, returning a HTML page + JS to execute the check and redirect if passed. But the downside of this solution would be the requirement of enabled JavaScript in the browser for all such redirects.

Sorry to take so long looking into this - a black list along these lines (with caching) is now implemented.