/imdb_data

Script to download IMDb data and convert it to tsv files

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Introduction

This repository contains code to create a tsv file of the IMDb dataset using the tensor2tensor library.

Usage

  1. Create and switch to a new Python 3.6+ environment.
  2. Navigate to the project's root directory.
  3. Execute:
    pip install -r requirements.txt
  4. Execute:
    python create_imdb_dataset.py --output_dir OUTPUT_DIR
    where OUTPUT_DIR is the path to where you want to save the training and test files.