/youtube-8m

Starter code for working with the YouTube-8M dataset.

Primary LanguagePythonApache License 2.0Apache-2.0

YouTube-8M Tensorflow Starter Code

This repo contains starter code for training and evaluating machine learning models over the YouTube-8M dataset. Out of the box, you can train a logistic classification model over either frame-level or video-level features. The code can be easily extended to train more sophisticated models.

Requirements

This starter code requires Tensorflow. If you haven't installed it yet, follow the instructions on tensorflow.org. This code has been tested with Tensorflow version 0.12.0-rc1. Going forward, we will continue to target the latest released version of Tensorflow.

You can download the YouTube-8M data files from here. We recommend downloading the smaller video-level features dataset first when getting started.

Quick Start on Video-Level Features

To start training a logistic model on the video-level features, run

MODEL_DIR=/tmp/yt8m
python train.py --train_data_pattern='/path/to/features/train*.tfrecord' --train_dir=$MODEL_DIR/logistic_model

Since the dataset is sharded into 4096 individual files, we use a wildcard (*) to represent all of those files.

To evaluate the model, run

python eval.py --eval_data_pattern='/path/to/features/validate*.tfrecord' --train_dir=$MODEL_DIR/logistic_model

As the model is training or evaluating, you can view the results on tensorboard by running

tensorboard --logdir=$MODEL_DIR

and navigating to http://localhost:6006 in your web browser.

When you are happy with your model, you can generate a csv file of predictions from it by running

python inference.py --output_file=predictions.csv --input_data_pattern='/path/to/features/validate*.tfrecord' --train_dir=$MODEL_DIR/logistic_model

This will output the top 20 predicted labels from the model for every example to 'predictions.csv'.

Using Frame Level Features

Follow the same instructions as above, appending --frame_features=True --model=FrameLevelLogisticModel --feature_names=inc3 for the train.py, eval.py, and inference.py scripts.

The 'FrameLevelLogisticModel' is designed to provide equivalent results to a logistic model trained over the video-level features. Please look at the 'models.py' file to see how to implement your own models.

Notes

One important thing to note is that by default, the train job will try to resume from an existing model checkpoint if there is one in the training directory. This may not be the behavior you want, especially during development. To start training a fresh model, add the --start_new_model flag to your run configuration.

Overview of Files

Training

  • train.py: The primary script for training models.
  • losses.py: Contains definitions for loss functions.
  • models.py: Contains definitions for models.
  • readers.py: Contains definitions for the Video dataset and Frame dataset readers.

Evaluation

  • eval.py: The primary script for evaluating models.
  • eval_util.py: Provides a class that calculates all evaluation metrics.
  • average_precision_calculator.py: Functions for calculating average precision.
  • mean_average_precision_calculator.py: Functions for calculating mean average precision.

Inference

  • inference.py: Generates an output file containing predictions of the model over a set of videos.

Misc

  • README.md: This documentation.
  • utils.py: Common functions.

About this project

This project is meant help people quickly get started working with the YouTube-8M dataset. This is not an official Google product.