Wikidata/Wikidata-Toolkit

WbGetEntitiesAction cannot search media-info for titles containing "-"

don-vip opened this issue · 3 comments

This code fails to retrieve the media-info for this file:

@Test
void testGetMediaInfoDocument() throws Exception {
    assertNotNull(WikibaseDataFetcher.getWikimediaCommonsDataFetcher().getEntityDocumentByTitle(
            "commonswiki", 
            "File:IAU_2006_General_Assembly-_Result_of_the_IAU_Resolution_Votes_(iau0603d).jpg"));
}

It's because of this split, line 169:

List<String> titlesList = titles == null ? Collections.emptyList() : Arrays.asList(titles.split("-"));

As the file name contains a dash, the map key does not match the full title.

As a workaround I use this hack:

    static MediaInfoDocument getMediaInfoDocument(String filename) throws MediaWikiApiErrorException, IOException {
        // workaround to https://github.com/Wikidata/Wikidata-Toolkit/issues/777
        String title = "File:" + filename;
        if (fetcher.getEntityDocumentsByTitle("commonswiki", title)
                .get(title.split("-")[0]) instanceof MediaInfoDocument doc) {
            return doc;
        } else {
            throw new IllegalStateException("No commons mediaInfo found for filename " + filename);
        }
    }

Thanks @don-vip! What a honor to have a JOSM developer in this modest repository! :)

This titles.split("-") should just be titles.split("|"). Titles cannot contain the | character and this is the one that is used to encode lists of parameters in this API. See https://commons.wikimedia.org/w/api.php?action=help&modules=wbgetentities

Hi @wetneb, thank you! You're welcome, thank you for the quick fix 👍