Parse page titles properly
Closed this issue · 0 comments
lahwaacz commented
- canonicalize (first letter case, underscores to spaces)
- percent-decoding for page titles (e.g.
[[GTK%2B]]
→[[GTK+]]
) - wrapper class for its
{{DISPLAYTITLE}}
,{{PAGENAME}}
,{{FULLPAGENAMEE}}
,{{BASEPAGENAMEE}}
,{{SUBPAGENAMEE}}
,{{SUBJECTPAGENAMEE}}
,{{TALKPAGENAMEE}}
,{{ROOTPAGENAME}}
attributes- https://www.mediawiki.org/wiki/Help:Magic_words#Page_names
- http://pythonhosted.org/mediawiki-utilities/lib/title.html#mw-lib-title
- implemented in
ws.parser_helpers.title
- handle leading colon (should be automatic for categories and files/images)
- double colons, e.g. [[wikipedia::Help:Editing]] -> [[wikipedia:Help:Editing]]
- method to detect namespace
- handle namespace aliases
- handle relative links:
[[/bar]]
on pagefoo
should be equivalent to[[foo/bar]]