Modified the original code in a (very) quick and dirty way for saving also of the content of the Wikipedia pages.
saramagliacane/wikipedia-provenance
Creating a provenance benchmark dataset out of wikipedia history pages
Java
Creating a provenance benchmark dataset out of wikipedia history pages
Java