WbGetEntitiesAction cannot search media-info for titles containing "-"
don-vip opened this issue · 3 comments
don-vip commented
This code fails to retrieve the media-info for this file:
@Test
void testGetMediaInfoDocument() throws Exception {
assertNotNull(WikibaseDataFetcher.getWikimediaCommonsDataFetcher().getEntityDocumentByTitle(
"commonswiki",
"File:IAU_2006_General_Assembly-_Result_of_the_IAU_Resolution_Votes_(iau0603d).jpg"));
}
It's because of this split, line 169:
List<String> titlesList = titles == null ? Collections.emptyList() : Arrays.asList(titles.split("-"));
As the file name contains a dash, the map key does not match the full title.
don-vip commented
As a workaround I use this hack:
static MediaInfoDocument getMediaInfoDocument(String filename) throws MediaWikiApiErrorException, IOException {
// workaround to https://github.com/Wikidata/Wikidata-Toolkit/issues/777
String title = "File:" + filename;
if (fetcher.getEntityDocumentsByTitle("commonswiki", title)
.get(title.split("-")[0]) instanceof MediaInfoDocument doc) {
return doc;
} else {
throw new IllegalStateException("No commons mediaInfo found for filename " + filename);
}
}
wetneb commented
Thanks @don-vip! What a honor to have a JOSM developer in this modest repository! :)
This titles.split("-")
should just be titles.split("|")
. Titles cannot contain the |
character and this is the one that is used to encode lists of parameters in this API. See https://commons.wikimedia.org/w/api.php?action=help&modules=wbgetentities