Unaccessible gender-aware namespace aliases
PeterBowman opened this issue · 2 comments
On some non-English language projects, a dedicate user namespace prefix alias is assigned to users that choose to pick female gender in their preferences. For instance, on plwiki male/unspecified gender users get the default Wikipedysta prefix, whereas female ones are identified with Wikipedystka (cf. Benutzer/Benutzerin on German projects, Usuario/Usuaria on Spanish wikis and so on).
Wiki.java automatically falls back to the default/male language-specific prefix upon normalization. It is not different from other normalization use cases, i.e. (for plwiki) User->Wikipedysta
, wikipedysta->Wikipedysta
, Wikipedystka->Wikipedysta
. However, MediaWiki honors the gender setting when a user page is queried.
Let's query w:pl:User:Cancre (on-wiki displayed as Wikipedystka:Cancre, female prefix alias) and also w:pl:User:Przykuta (Wikipedysta:Przykuta, male/default prefix) just for comparison (api.php):
<?xml version="1.0"?>
<api batchcomplete="">
<query>
<normalized>
<n from="User:Cancre" to="Wikipedystka:Cancre" />
<n from="User:Przykuta" to="Wikipedysta:Przykuta" />
</normalized>
<pages>
<page _idx="320152" pageid="320152" ns="2" title="Wikipedystka:Cancre" />
<page _idx="93794" pageid="93794" ns="2" title="Wikipedysta:Przykuta" />
</pages>
</query>
</api>
Wiki.java expects the normalized page name to also fall back to the male/default prefix (Wikipedysta:Cancre). It can't find it in the pages
array, though, because of the special treatment of gender aliases in this specific namespace. Example:
var wiki = Wiki.newSession("pl.wikipedia.org");
wiki.getPageInfo(List.of("User:Cancre", "User:Przykuta")).forEach(System.out::println);
Result (first line refers to User:Cancre):
null
{redirect=false, size=550, lastpurged=2018-09-06T04:27:13Z, exists=true, watchers=159, protection={editexpiry=null, move=autoconfirmed, edit=autoconfirmed, cascade=false, moveexpiry=null}, pageid=93794, displaytitle=Wikipedysta:Przykuta, lastrevid=44294744, inputpagename=User:Przykuta, pagename=Wikipedysta:Przykuta, timestamp=2021-04-02T19:03:26.546280+02:00}
Reason: Wiki.java calls normalize()
internally and reorders the query results according to the input titles. This normalize()
method does not take into account the gender of the underlying user a user page refers to. The following scheme can be found in several places, e.g. getPageInfo()
:
wiki-java/src/org/wikipedia/Wiki.java
Lines 1754 to 1763 in c8cc5a1
Since getPageInfo()
is always called internally by edit()
, this bug makes it impossible to edit user pages prefixed with female aliases on gender-aware language wikis.
Possible solution: parse the <normalized>
element if present and use that information instead of normalize()
to link query results with input titles. I'd implement some sort of resolveNormalizedParser()
helper method (analogous to resolveRedirectParser()
) for that matter. The existing normalize()
method would be explicitly documented to serve limited offline-based title normalization purposes, remarking that it's not fully aware of certain quirks (such as gender aliasing) for obvious reasons.
Bonus: solving this would also solve #162.
@MER-C are you OK with this proposal? I'd be happy to work on a patch if so.
Sounds good.