Wikidata/editgroups

Inadequate edit group summary (only part after comma chosen)

lucaswerkmeister opened this issue · 3 comments

In this edit group, done with OpenRefine, I used the following edit summary:

import some Theodor Heuss awards and medals, see award item talk page for more information

However, on the edit group page the summary is given as

see award item talk page for more information

which is the less useful half of the summary :)


I assume that the tool tries to split up the full edit summary

Created claim: Property:P166: Q55064109, import some Theodor Heuss awards and medals, see award item talk page for more information (details)

into auto-summary (Created claim: Property:P166: Q55064109 – everything before the first comma), custom summary (import some Theodor Heuss awards and medals, see award item talk page for more information) and EditGroups link ((details)), and the regex should be tweaked to make the auto-summary part less greedy, to split on the first comma in the full summary instead of the last one?

Unfortunately, I can’t seem to find the configuration for the individual tools in this repository, except in this migration script, which I assume is out-of-date (it doesn’t mention the “custom bot” tool, for one).

The regex currently is the one from the migration… and yeah, it does not work well! To be honest I was aware of the issue when writing it but I am not sure how to fix it… The issue is that the user-provided summaries do not always start after the first comma. It depends a lot on the action type.

For instance say you add a description, it will look like this:
/* wbsetdescription-add:1|en */ a long description, with some commas in it, importing my awesome database
No idea how to deal with that!

One other idea would be not to the first few summaries and keep the longest common suffix of them, maybe… but if you are always adding the same description then this is also going to end up in the summary… I really don't know! But happy to change the regex for sure.

I went for a hopefully bullet-proof solution which consists in replacing commas in summaries by a very similar Unicode character. It's ugly but I find it funny.