evazion/translate-pixiv-tags

Artist URL results need validation

Closed this issue · 4 comments

I noticed that some URLs from some domains do very loose matching, which causes false matches in some cases. In this case, it was a Medibang URL from SauceNAO, although there are probably a bunch of domains that still have that problem.
http://saucenao.com/search.php?db=999&url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FD6SULOHUUAACSQa.jpg
image
https://danbooru.donmai.us/artists?search%5Burl_matches%5D=https%3A%2F%2Fmedibang.com%2Fauthor%2F5142710

Therefore filter the results first based upon the presence of the URL being searched before rendering the artist names.

function ValidateFilterArtists(artistlist,url) {
    var templist = jQuery.extend([], artistlist);
    var parseurl = $("<a \>").attr('href',url)[0];
    for (let i = artistlist.length - 1; i >= 0; i--) {
        let found = false;
        for (let j = 0; j < artistlist[i]['urls'].length; j++) {
            var parseregular = $("<a \>").attr('href',artistlist[i]['urls'][j]['url'])[0];
            var parsenormal = $("<a \>").attr('href',artistlist[i]['urls'][j]['normalized_url'])[0];
            if ((parseregular.host == parseurl.host && parseregular.pathname == parseurl.pathname && parseregular.search == parseurl.search) ||
                (parsenormal.host == parseurl.host && parsenormal.pathname == parseurl.pathname && parsenormal.search == parseurl.search)) {
                found = true;
                break
            }
        }
        if (!found) {
            templist.splice(i,1);
        }
    }
    return templist;
}

The above is pretty close to the code I use to validate that artist results are valid when searching by URL.

Another solution might be to disallow URL searches for domains (e.g. Medibang) with that problem.

7nik commented

Why not just reject results which contain 10 artists? It should cover all such cases.
Though as counter-case can be one when a link belongs to a circle, but I'm not sure what happens in this case.

I suppose that's workable, though the number of results returned could change in the future. But I doubt there's a valid link which would actually be checked that would return so many results.

However, for myself at least, having such a loose and indirect method of validation feels a bit discomforting. That's just me though.

7nik commented

I tested your solution and found the following problem:
on https://www.artstation.com/aoiogata the userscript should add aoi ogata but it doesn't because his ArtStation links are
http://aoiogata.artstation.com/
https://www.artstation.com/artist/aoiogata
and both are normalized to
http://www.artstation.com/aoiogata/
but none of them equal to
https://www.artstation.com/aoiogata

Yeah, to be honest, the full solution for my code also includes a massive regex dictionary that breaks down all artist links before doing the comparison, but such might be beyond the scope for such a small project.

However, I did a small fix to the above code, where it normalizes all pathnames to remove that final forward slash if it exists.

function ValidateFilterArtists(artistlist,url) {
    var templist = jQuery.extend([], artistlist);
    var parseurl = $("<a \>").attr('href',url)[0];
    for (let i = artistlist.length - 1; i >= 0; i--) {
        let found = false;
        for (let j = 0; j < artistlist[i]['urls'].length; j++) {
            var parseregular = $("<a \>").attr('href',artistlist[i]['urls'][j]['url'].replace(/\/$/,''))[0];
            var parsenormal = $("<a \>").attr('href',artistlist[i]['urls'][j]['normalized_url'].replace(/\/$/,''))[0];
            if ((parseregular.host == parseurl.host && parseregular.pathname == parseurl.pathname && parseregular.search == parseurl.search) ||
                (parsenormal.host == parseurl.host && parsenormal.pathname == parseurl.pathname && parsenormal.search == parseurl.search)) {
                found = true;
                break
            }
        }
        if (!found) {
            templist.splice(i,1);
        }
    }
    return templist;
}

I tested it with the artist and URL you mentioned, and it successfully returns the artist as expected.