Wiki.getPageInfo() chokes on HTML entities

This method (and perhaps others, too) builds an internal map of API results in which the keys are page titles as normalized(/encoded) by the MW server. It depends on the implementation of Wiki.normalize(String) to map a result to a requested title, filling the output string array with null values whenever such normalization fails.

wiki-java/src/org/wikipedia/Wiki.java

Lines 1704 to 1714 in 48268ff

    
           Map<String, Object>[] info = new HashMap[pages.length]; 
        
           // Reorder. Make a new HashMap so that inputpagename remains unique. 
        
           for (int i = 0; i < pages2.length; i++) 
        
           { 
        
               Map<String, Object> tempmap = metamap.get(normalize(pages2[i])); 
        
               if (tempmap != null) 
        
               { 
        
                   info[i] = new HashMap<>(tempmap); 
        
                   info[i].put("inputpagename", pages[i]); 
        
               } 
        
           }

I found out that titles with HTML entities, when passed on to Wiki.getPageInfo(), are correctly encoded in the post request, then processed by MW to produce an info object, which is finally read by Wiki.makeApiCall(). However, I am getting a null value on line 1708 because Wiki.normalize() does not escape such entities.

Example:

Wiki wiki = Wiki.createInstance("pl.wiktionary.org");
wiki.version(); // just in case, this is a wgCapitalLinks=false wiki
wiki.getPageInfo(new String[] { "1 000 000 000", "1&nbsp;000&nbsp;000&nbsp;000" });

Output:

{size=27, lastpurged=2017-12-16T20:13:11Z, exists=true, protection={cascade=false}, pageid=129007, displaytitle=1000000000, lastrevid=655176, inputpagename=1000000000, pagename=1000000000, timestamp=2018-09-24T19:02:55.981+02:00}
null

This issue propagates to Wiki.exists(). The following will cause a null pointer exception which I managed to avoid via PeterBowman@8ccae5a:

wiki.exists(new String[] { "1&nbsp;000&nbsp;000&nbsp;000" });

Acknowledged. Not sure of the best solution yet without requiring external dependencies.

	Map<String, Object>[] info = new HashMap[pages.length];
	// Reorder. Make a new HashMap so that inputpagename remains unique.
	for (int i = 0; i < pages2.length; i++)
	{
	Map<String, Object> tempmap = metamap.get(normalize(pages2[i]));
	if (tempmap != null)
	{
	info[i] = new HashMap<>(tempmap);
	info[i].put("inputpagename", pages[i]);
	}
	}