This repository contains code to create a tsv file of the IMDb dataset using the tensor2tensor library.
- Create and switch to a new Python 3.6+ environment.
- Navigate to the project's root directory.
- Execute:
pip install -r requirements.txt
- Execute:
where
python create_imdb_dataset.py --output_dir OUTPUT_DIR
OUTPUT_DIR
is the path to where you want to save the training and test files.