WikiTalkParser is a library for extracting and parsing Wikipedia talk pages, identifying comments with their signature, date and indentation in the thread structure. In the current version, talk pages are extracted from the WIkipedia API, given in input a list of articles. Only the English language version is supported.
Tested with Python 2.7
David Laniado and Riccardo Tasso
- The parser works only for the English Wikipedia. We are currently working to make it multilingual
- This version was only tested with article talk pages. Support for user talk pages will be added
- Users are identified via user name, and user id generated by the software (official Wikipedia user ids are not supported)
- "Outdent" command is currently not managed
For further information, see research paper: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages