/dsci-benchmark

R scripts for benchmarking next word prediction algorithms developed for the Coursera Data Science Capstone Project.

Primary LanguageR

Next word prediction benchmark

A simple R script for benchmarking a next word prediction algorithm.

Please note: For a modified version of the benchmark for the current iteration of the Capstone project, please check out [this fork by Hernán] (https://github.com/hfoffani/dsci-benchmark)

Usage:

  1. Download the repository
  2. Extract data.zip into the current folder (password is provided in the Coursera forum)
  3. Open benchmark.R and run the code up to section 03
  4. (optional) create a wrapper function for your prediction function (section 03)
  5. Perform the benchmark (section 04)
  6. Report your results in the Coursera Forum

File description:

  • data.zip Archive containing the benchmark datasets.
  • benchmark.R Script needed to perform the benchmark (see above).
  • generate_dataset.R Script used to generate the benchmark datasets (this should not be re-run and is provided for reference only)