Youtube-8M Video-Level Features Analysis

In this project, we will apply various big data analytics techniques and algorithms in order to study the YouTube-8M dataset which contains machine generated labels, RGB features and audio features for each videos on YouTube platform. We will find out the dominant video categories, frequent itemsets of video categories and group videos into clusters using available features.

Dataset

YouTube-8M video-level features dataset is used in this project. Video-level features are stored as tensorflow.Example protocol buffers. A tensorflow.Example proto is reproduced here in text format:

features: {
  feature: {
    key  : "id"
    value: {
      bytes_list: {
        value: (Video id)
      }
    }
  }
  feature: {
    key  : "labels"
    value: {
      int64_list: {
        value: [1, 522, 11, 172]  # label list
      }
    }
  }
  feature: {
    # Average of all 'rgb' features for the video
    key  : "mean_rgb"
    value: {
      float_list: {
        value: [1024 float features]
      }
    }
  }
  feature: {
    # Average of all 'audio' features for the video
    key  : "mean_audio"
    value: {
      float_list: {
        value: [128 float features]
      }
    }
  }
}

Task

Dominant categories on YouTube
K-th Frequent Itemsets of video categories
Group videos into clusters according to audio features

Approach

Count video categories under MapReduce framework using key = category id
Implement Apriori algorithm under MapReduce framework
Implement K-Means algorithm

Command

Task 1

# Execute MapReduce job
yarn jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.3.jar \
-files hdfs:///yt8m-analysis/task1/mapper.py,hdfs:///yt8m-analysis/task1/reducer.py \
-mapper 'python3 mapper.py' \
-reducer 'python3 reducer.py' \
-input /preprocessed_data/category.txt \
-output /yt8m-analysis/task1/output

# View output
hadoop fs -text /yt8m-analysis/task1/output/*

dennis199441/yt8m-analysis

Youtube-8M Video-Level Features Analysis

Dataset

Task

Approach

Command

Task 1