Can we predict the Most Replayed data of video streaming platforms?

Official repository of the paper "Can we predict the Most Replayed data of video streaming platforms?" [ArXiv] [DOI]

Dataset

Download links

Google Drive: https://drive.google.com/file/d/1R8A7OtA9goaHskOYCxyxBIcZoLoKJU4s/view?usp=sharing

Dataset structure

Each key in the H5 file is the id of a single video. A key corresponds to a Group that contains 2 H5 Datasets, "features" and "heat-markers".

For instance:

/-14Dre9CVjk (Group with VIDEO_ID as the key)
    /features (Dataset with shape (548, 1024), type "<f8")
    /heat-markers  (Dataset with shape (100,), type "<f8")
/-Gm_IKNRqgQ
    ...

"features" contains the extracted I3D features of the video

"heat-markers" contains the Most Replayed data from YouTube

To watch the videos you can browse to youtube.com/watch?v=VIDEO_ID

Code

Code structure

Entry point: model/main.py

User study in evaluation/user_study/