/Wikipedia

Tools to collect Wikipedia string data.

Primary LanguagePython

Wikipedia Data Scrape

Tools to create data sets that mimic the SKEW and DISTINCT files from:

Askitis, Nikolas, and Justin Zobel. "Redesigning the string hash table, burst trie, and bst to exploit cache." Journal of Experimental Algorithmics (JEA) 15 (2010): 1-7.

Configure and Build

  1. Execute make all
  2. Execute R --vanilla < create_data_sets.R

Data and code are licenced under the Creative Commons Attribution-Share-Alike License 3.0.