This is an bot developed in R for editing Semantic MediaWiki templates. This code is very much in development, and it is highly recommended to test it on a few pages before letting it loose on a wiki.
The primary motivation for Yet Another MediaWiki Bot Framework is that this bot is specifically design to help with batch editing of data contained within Semantic Templates that are commonly used with Semantic MediaWiki.
The main idea is that this bot converts templates into data structures in R. For example, it allows you to read from a wiki page a template such as:
{{City | point=52.015, 4.356667 | country=Netherlands }}
...and then convert this data into a list within R. The data contained in the list can be accessed via template$point, template$country, etc.
install.packages('devtools') # only if not already installed
library(devtools)
install_github("cbdavis/RSemanticMediaWikiBot")
The functions can then be accessed from within R code by first declaring:
library(RSemanticMediaWikiBot)
#TODO fill these in based on your own configuration username=USERNAME password=PASSWORD apiURL = "http://my.wiki.com/wiki/api.php" bot = initializeBot(apiURL) #initialize the bot login(username, password, bot) #login to the wiki
text = read(title="MyWikiPage", bot)
edit(title="MyWikiPage", text="this is the new page text", bot, summary="my edit summary")
delete(pageName, bot, reason="deleting old page")
Assuming that you are not working with multiple instance templates, you can retrieve and modify the data in a template as such:
template = getTemplateByName("MyTemplateName", "MyWikiPage", bot)[[1]] #[[1]] is needed as a list is returned #If using multiple-instance templates, then multiple templates will be returned
valueOfTemplate = template$data$NameOfTemplateParameter
You can then modify this value by:
template$data$NameOfTemplateParameter = newValue
If you want to completely remove a parameter from a template (i.e. both the key and the value) such as changing this:
{{City | point=52.015, 4.356667 | country=Netherlands }}
to this:
{{City | country=Netherlands }}
then you can just do:
template$data$point = NULL
The template with its new value can then be written back to the wiki as such:
writeTemplateToPage(template, bot, editSummary="testing bot")
The template contains information about the page which it came from, so the name of the page does not need to be specified.
Spreadsheet data loaded into a dataframe can be used to make it easy to write data to templates contained on multiple pages. The first column of the data frame specifies the name of the page, while the second column is the name of the template to write to. The headers for the rest of the columns need to correspond to the names of the parameters in that template. The default behavior of this code is to not overwrite existing values unless you explicitly tell it to. A list of pages for which an existing value for a parameter were found are returned.
# default - will not overwrite existing parameter values that are already set errorDFEntries = writeDataFrameToPageTemplates(dataFrame, bot, editSummary="what the bot is doing") # overwrite existing values errorDFEntries = writeDataFrameToPageTemplates(dataFrame, bot, overWriteConflicts=TRUE, editSummary="what the bot is doing")
The syntax for a sortable wikitable can be generated from a data frame. The code currently doesn't figure out how to intelligently put it on a page - it's up to you to figure out how to paste things together in some useful way.
# get the wiki table syntax wikiTable = getWikiTableTextForDataFrame(df) # put some text before and after the table pageText = paste(someText, "\n\n", wikiTable, "\n\n", someMoreText, sep="") # write this all to some wiki page edit(title=pageTitle, text=pageText, bot, summary="adding a table")
- No support yet for multiple-instance templates. There needs to be a way to distinguish if one wants to edit an existing one, or add another.
- No support yet for adding a new template to a page.
- When editing a page, no check is done to see if it will create the page.
- Nested template calls may not be parsed correctly
- If the code is not able to connect to the wiki API, then it will terminate instead of trying to connect again. In practical experience, this means that you may have to run a script multiple times if you have several thousand edits.
- There seems to be a memory leak if you read and/or edit around 10,000+ pages.