Wiki.java FAQ/TODO/whinge list and thoughts
MER-C opened this issue · 3 comments
MER-C commented
Pull requests are welcome on some of these, please ask for my thoughts first.
MediaWiki annoyances and wishlist
a.k.a. why can't I do X?
- Revisions from contribs don't have SHA-1 populated: https://phabricator.wikimedia.org/T185809
- Revisions from watchlist don't have SHA-1 populated: https://phabricator.wikimedia.org/T185808
- Range contribs through the API: https://phabricator.wikimedia.org/T177150. Remove rangeContribs once this is done.
- getBlockList() returning rangeblocks if given a specific IP address: https://phabricator.wikimedia.org/T183300
- No [[Special:Recentchangeslinked]]: https://phabricator.wikimedia.org/T17552
- No filtering by RevisionDelete status in rcoptions: https://phabricator.wikimedia.org/T28874, https://phabricator.wikimedia.org/T29019
- Cannot vectorize getFirstRevision: https://phabricator.wikimedia.org/T188672
- Vectorization of getPageHistory is not allowed for the same reason as immediately above.
- No options HashMap in deletedContribs: https://phabricator.wikimedia.org/T185705
- Can't use IP ranges as a target in getLogEntries: https://phabricator.wikimedia.org/T183300
- Size diffs must be computed manually for getPageHistory, getRevisions, etc.: https://phabricator.wikimedia.org/T143444
- Cannot search the archive table for deleted pages: https://phabricator.wikimedia.org/T192023
- No machine readable diffs: https://phabricator.wikimedia.org/T56328 . See also https://phabricator.wikimedia.org/T117279 for an alternative diff format.
- Preview mode warnings in parse(). The blame lies with editors abusing wikimarkup to detect whether they are in preview mode to show warnings (fair enough) and MediaWiki not providing a definitive way to do this. See https://phabricator.wikimedia.org/T141970
Missing features
- Tags
- Deleted files
- Revision deletion of
LogEntry(c3380f4), deleted revisions and old and deleted images - User account creation is deliberately not implemented.
Vectorization
- getLastRevision
- getDeletedText: adding more titles gives all deleted revisions in those pages...
- getDeletedHistory (do after reverse is culled)
- getDeletedRevisions
- getBlockList (users only). Add filters and support IPs properly.
- Go wide in parse() to fetch the parsed text, original wikitext, wikilinks, categories, external links, sections and templates, all at the same time. This could be the base of a Page object (yes, I finally have meaningful data to put in there.)
- Make text a field of Revision, getRevisions fetch text optionally and Revision.getText lazy loading of text (with a warning that it shouldn't be used in loops).
General FIXMEs
- parse+diff: missingtitle is a generic error messages that represent real unrecoverable assertion errors in other methods.
- LogEntry details handling (#126)
- Simplify site info caching.
Deprecated API removal
- Change signatures of parse and diff. Deprecate some trampolines.
WMF specific
- OAuth support (#153)
Utilities
- CSV export (revisions, log entries, user info?, page info?)
- Diff parsing -- refactoring? I'd like to see machine readable diffs first.
- Export of tabular data to wiki table (may be useful, just an idea at the moment)
- LogEntry -> wikitext table
Tools
- Explore stuff in paid for spam
- UserLinkAdditionFinder: servlet version -- limited to one user per request if useful.
- UserLinkAdditionFinder/CCIAnalyzer: do not return links or analyze text that was already there. Requires diff parsing refactoring.
- CCIAnalyzer: aggressive mode (#97)
- Transition user watchlist into a generic mass contribution fetcher, particularly from
categoriesand lists of users. The tool should support new pages only (for spam sockfarms). - ContributionSurveyor:
split long surveys into multiple text files, 2000 articles per file, andserve them ZIPped. - AdminStats: protections.
- AdminStats: writeup and plots.
Non-problems and implementation notes
- Why is X (e.g. assertion modes, log types, namespaces) not implemented as an Enum? MediaWiki has a large library of extensions, each extension may add more possible values. Furthermore, the site owner may add other possible values (e.g. more namespaces). Wiki.java only covers MediaWiki as shipped with no extensions.
Abbe98 commented
I was wondering if anyone else got a use case for having the two upload methods support warnings. Turning ignorewarnings
into a method parameter and letting the methods return a list of warnings if it's not enabled.
https://www.mediawiki.org/wiki/API:Upload#Upload_warnings
This would be very useful for Pattypan as it could serve as a neat solution to detect duplicates and raise other potential issues.