glimmerphoenix/WikiDAT

Duplicate page and revision entries in iswiki (20140603)

glimmerphoenix opened this issue · 1 comments

Found 107 duplicate page entries and 1000 duplicate revision entries in database for lang iswiki, date 20140603.

SELECT page_id from page GROUP BY page_id having count(*) >= 2;
...
107 rows in set (0.08 sec)

SELECT rev_id from revision GROUP BY rev_id having count(*) >= 2;
...
1000 rows in set (6.29 sec)

This should never occur, as each page and revision element is only parsed once, and there should be no duplicate elements in the compressed XML dump file.

Further inspection is required to determine if this is caused by a faulty dump file or is a problem with the parser.

Apparently fictitious bug from consecutive executions. Closing for now.