A list of awesome papers and resources of recommender system on large language model (LLM).
🎉 News: Our LLM4Rec survey has been released. A Survey on Large Language Models for Recommendation
The related work and projects will be updated soon and continuously.
If our work has been of assistance to you, please feel free to cite our survey. Thank you.
@article{llm4recsurvey,
author = {Likang Wu and Zhi Zheng and Zhaopeng Qiu and Hao Wang and Hongchao Gu and Tingjia Shen and Chuan Qin and Chen Zhu and Hengshu Zhu and Qi Liu and Hui Xiong and Enhong Chen},
title = {A Survey on Large Language Models for Recommendation},
journal = {CoRR},
volume = {abs/2305.19860},
year = {2023}
}
- The papers and related projects
- Single card (RTX 3090) debuggable generative language models that support Chinese corpus
Note: The tuning here only indicates whether the LLM model has been tuned.
Name | Scene | Tasks | Information | URL |
---|---|---|---|---|
Amazon Review | Commerce | Seq Rec/CF Rec | This is a large crawl of product reviews from Amazon. Ratings: 82.83 million, Users: 20.98 million, Items: 9.35 million, Timespan: May 1996 - July 2014 | link |
Amazon-M2 | Commerce | Seq Rec/CF Rec | A large dataset of anonymized user sessions with their interacted products collected from multiple language sources at Amazon. It includes 3,606,249 train sessions, 361,659 test sessions, and 1,410,675 products. | link |
Steam | Game | Seq Rec/CF Rec | Reviews represent a great opportunity to break down the satisfaction and dissatisfaction factors around games. Reviews: 7,793,069, Users: 2,567,538, Items: 15,474, Bundles: 615 | link |
MovieLens | Movie | General | The dataset consists of 4 sub-datasets, which describe users' ratings to movies and free-text tagging activities from MovieLens, a movie recommendation service. | link |
Yelp | Commerce | General | There are 6,990,280 reviews, 150,346 businesses, 200,100 pictures, 11 metropolitan areas, 908,915 tips by 1,987,897 users. Over 1.2 million business attributes like hours, parking, availability, etc. | link |
Douban | Movie, Music, Book | Seq Rec/CF Rec | This dataset includes three domains, i.e., movie, music, and book, and different kinds of raw information, i.e., ratings, reviews, item details, user profiles, tags (labels), and date. | link |
MIND | News | General | MIND contains about 160k English news articles and more than 15 million impression logs generated by 1 million users. Every news contains textual content including title, abstract, body, category, and entities. | link |
U-NEED | Commerce | Conversation Rec | U-NEED consists of 7,698 fine-grained annotated pre-sales dialogues, 333,879 user behaviors, and 332,148 product knowledge tuples. | link |
PixelRec | Short Video | Seq Rec/CF Rec | PixelRec is a large dataset of cover images collected from a short video recommender system, comprising approximately 200 million user image interactions, 30 million users, and 400,000 video cover images. The texts and other aggregated attributes of videos are also included. | link |
Some open-source and effective projects can be adpated to the recommendation systems based on Chinese textual data. Especially for the individual researchers !
Project | Year |
---|---|
baichuan-7B | 2023 |
YuLan-chat | 2023 |
Chinese-LLaMA-Alpaca | 2023 |
THUDM/ChatGLM-6B | 2023 |
FreedomIntelligence/LLMZoo Phoenix | 2023 |
bloomz-7b1 | 2023 |
LianjiaTech/BELLE | 2023 |
Hope our conclusion can help your work.