
A repository listing important datasets for multimodal recommender systems


Common Datasets

Name Scene Tasks Information URL
PixelRec Stream media Seq Rec/CF Rec PixelRec is a large dataset of cover images collected from a short video recommender system, comprising approximately 200 million user image interactions, 30 million users, and 400,000 video cover images. The texts and other aggregated attributes of videos are also included. link
NineRec News, Video, Ads, Images Seq Rec/CF Rec/Cross-domain Rec NineRec is a large multimodal recommendation dataset of collected from five famous feeds platform, comprising one pre-trained source dataset and 9 diverse target datasets. Both text and high-resoultion images are included. link
MicroLens Short videos Seq Rec/CF Rec MicroLens is a large short video recommendation dataset of collected from a short video platform, comprising 1 bilion interactions, 3 million users and 1 million short videos. Text, images, audio and images are all included. link
Amazon Review Commerce Seq Rec/CF Rec This is a large crawl of product reviews from Amazon. Ratings: 82.83 million, Users: 20.98 million, Items: 9.35 million, Timespan: May 1996 - July 2014 link
Steam Game Seq Rec/CF Rec Reviews represent a great opportunity to break down the satisfaction and dissatisfaction factors around games. Reviews: 7,793,069, Users: 2,567,538, Items: 15,474, Bundles: 615 link
MovieLens Movie Rating Prediction The dataset should not be used for sequential recommendation and several other top-N recommendation tasks see https://arxiv.org/pdf/2307.09985.pdf. link
Yelp Commerce General There are 6,990,280 reviews, 150,346 businesses, 200,100 pictures, 11 metropolitan areas, 908,915 tips by 1,987,897 users. Over 1.2 million business attributes like hours, parking, availability, etc. link
MIND News General MIND contains about 160k English news articles and more than 15 million impression logs generated by 1 million users. Every news contains textual content including title, abstract, body, category, and entities. link
U-NEED Commerce Conversation Rec U-NEED consists of 7,698 fine-grained annotated pre-sales dialogues, 333,879 user behaviors, and 332,148 product knowledge tuples. link
KuaiSAR Video Search and Rec KuaiSAR contains genuine search and recommendation behaviors of 25,877 users, 6,890,707 items, 453,667 queries, and 19,664,885 actions within a span of 19 days on the Kuaishou app link
Tenrec Video, Article General Tenrec is a large-scale benchmark dataset for recommendation systems. It contains around 5 million users and 140 million interactions. it covers four recommendation scenarios link