Encourage non-conflicting edits
Krinkle opened this issue · 2 comments
When making changes to existing content (e.g. not appending or prepending text), it is important that bots don't accidentally overwrite edits by other users.
The way bots should do this to, when fetching the existing content, pass the revision timestamp to the edit module. This way, if another edit has been made since then, the edit will be rejected. At this point the bot can either try again, or skip the item for the time being.
mwbot provides a getArticle method, but it doesn't expose any meta data besides the page content.
Please provide an easy way for developers to use mwbot to make edits in a way that doesn't cause human edits to be overwritten by default. It should perhaps be an option to ignore conflicts, but by default it probably should not ignore conflicts.
Ideas:
// getPage(string name) -> API query revisions, rvprop=content|timestamp
client.getPage(name, function (err, data) {
// data.title
// data.content
// data.timestamp
var newContent = change(data.content);
// Method 1: edit( pageName, content, summary, params, callback )
client.edit(data.title, newContent, '', { basetimestamp: data.timestamp, fn(err,data) });
// Method 2: edit( string|Object pageData, content, summary, calllback )
client.edit(data, newContent, '', function (err, data) { });
});
The second method is probably easiest and encourages developers to use it without hardcoding details of parameters.
I created a quick draft of this for my own bot. See https://gist.github.com/Krinkle/8e1e0e41baaae63f9839d86d918d512c/c1788e5ececff3ddf373944d3dc41ced99be08af#file-wmf-tour-bot-js-L107-L139. Feel free to use it as in any way you like.
client.edit = function(pageData, content, summary, minor, callback) {
var params = {
text: content,
// Avoid accidentally editing as anonymous user if session expires
assert: 'user'
};
if (typeof minor === 'function') {
callback = minor;
minor = undefined;
}
if (minor) {
params.minor = '';
} else {
params.notminor = '';
}
var title;
if (typeof pageData === 'object') {
params.basetimestamp = pageData.revision.timestamp;
params.starttimestamp = new Date().toISOString();
// Avoid accidentally creating a new page (e.g. if title string got corrupted,
// or if page was deleted meanwhile).
params.nocreate = '';
title = pageData.title;
} else {
title = pageData;
}
this.doEdit('edit', title, summary, params, callback);
};
This protects bots against various problems. The same protections that the MediaWiki web interface uses when you edit via a web browser:
- Session loss or expiration (
assert=user
). Avoid making edits as anonymous user if server lost the session. - Edit conflict (
basetimestamp
). Avoid overriding other edits. - Re-create (
starttimestamp
). Avoid re-publishing text that was deleted. - Edit type (
nocreate
/createonly
). Avoid creating new pages or edits. Sometimes if the page title is corrupted (e.g. bad encoding) it can happen that the content is fetched from A but saved to B - causing a duplicate page to be created. (E.g. If spaces are cut off and you fetch from "Foo bar" and save to "Foo", or a character encoding problem).