MER-C/wiki-java

Wiki.java FAQ/TODO/whinge list and thoughts

MER-C opened this issue · 3 comments

MER-C commented

Pull requests are welcome on some of these, please ask for my thoughts first.

MediaWiki annoyances and wishlist

a.k.a. why can't I do X?

Missing features

Vectorization

  • getLastRevision
  • getDeletedText: adding more titles gives all deleted revisions in those pages...
  • getDeletedHistory (do after reverse is culled)
  • getDeletedRevisions
  • getBlockList (users only). Add filters and support IPs properly.
  • Go wide in parse() to fetch the parsed text, original wikitext, wikilinks, categories, external links, sections and templates, all at the same time. This could be the base of a Page object (yes, I finally have meaningful data to put in there.)
  • Make text a field of Revision, getRevisions fetch text optionally and Revision.getText lazy loading of text (with a warning that it shouldn't be used in loops).

General FIXMEs

  • parse+diff: missingtitle is a generic error messages that represent real unrecoverable assertion errors in other methods.
  • LogEntry details handling (#126)
  • Simplify site info caching.

Deprecated API removal

  • Change signatures of parse and diff. Deprecate some trampolines.

WMF specific

  • OAuth support (#153)

Utilities

  • CSV export (revisions, log entries, user info?, page info?)
  • Diff parsing -- refactoring? I'd like to see machine readable diffs first.
  • Export of tabular data to wiki table (may be useful, just an idea at the moment)
  • LogEntry -> wikitext table

Tools

  • Explore stuff in paid for spam
  • UserLinkAdditionFinder: servlet version -- limited to one user per request if useful.
  • UserLinkAdditionFinder/CCIAnalyzer: do not return links or analyze text that was already there. Requires diff parsing refactoring.
  • CCIAnalyzer: aggressive mode (#97)
  • Transition user watchlist into a generic mass contribution fetcher, particularly from categories and lists of users. The tool should support new pages only (for spam sockfarms).
  • ContributionSurveyor: split long surveys into multiple text files, 2000 articles per file, and serve them ZIPped.
  • AdminStats: protections.
  • AdminStats: writeup and plots.

Non-problems and implementation notes

  • Why is X (e.g. assertion modes, log types, namespaces) not implemented as an Enum? MediaWiki has a large library of extensions, each extension may add more possible values. Furthermore, the site owner may add other possible values (e.g. more namespaces). Wiki.java only covers MediaWiki as shipped with no extensions.

I was wondering if anyone else got a use case for having the two upload methods support warnings. Turning ignorewarnings into a method parameter and letting the methods return a list of warnings if it's not enabled.

https://www.mediawiki.org/wiki/API:Upload#Upload_warnings

This would be very useful for Pattypan as it could serve as a neat solution to detect duplicates and raise other potential issues.

MER-C commented

I did the obvious thing with fd9b2dd.

I'm afraid fd9b2dd won't solve it on our end as we want to detect duplicates prior to or during the upload. We can solve it by just sending the hash to the WM-API but I imagine more upload tools could benefit for a way to get upload warnings.