THUIR/ZhihuRec-Dataset

Questions about Recommendation with negative feedbacks

Closed this issue · 3 comments

Hello, thank you for offering this helpful dataset to the community!

I am working on negative feedbacks in the recommendation system.
This paper offers some discussions about negative feedbacks in sec 4.4.2.

When the users interact with answers, they give positive and negative feedbacks to them. 
Positive feedbacks are like the users’ click, bookmark, and like the answers

The negative infomation is 答案被举报的次数, 答案被反对的次数, 答案收到的「没有帮助」的次数 and 该用户的回答收到的反对数量.
But these feedbacks are not offered in the dataset if I am not missing something important.

As described in the readme.txt, it has the following files:

zhihu100M.txt (2.58G) --- user interactions
zhihu20M.txt (529M)--- user interactions
zhihu1M.txt (26.4M)--- user interactions
user_infos.txt (689M)--- the features of the users occured in the dataset
answer_infos.txt (1.32G)--- the features of the answers occured in the dataset
question_infos.txt (30.7M)--- the features of the questions occured in the dataset
author_infos.txt (11.2M)--- the features of the authors occured in the dataset
topic_infos.txt (6.11M)--- the features of the topics occured in the dataset

In zhihu1M.txt files:
it includes:

  • 用户 ID
  • 用户与回答的交互数;
  • 用户与回答的交互;可能有 0 条或多条,以半角逗号分割;每条的格式为:DocID|回答展示时间|回答阅读时间
  • 用户的搜索行为数
  • 用户的搜索行为: 搜索词|搜索时间

The paper refers to [22]:

They use the: Goodbooks-10k (Goodbooks) and Movielens1M (ML-1M). (Both datasets
contain ratings on 1-5 scale by users on items).  

For each user, similar to [5], they calculated the mean rating and
treated ratings smaller than mean as negative and those equal or greater than the mean as positive).

But it not gives the positive/negatives info for the interactions.

Besides, are there any codes to run with RecBole as section 4.1 listed?
It will be helpful to reproduce the results in Table 9.

Thanks for your valuable question!

  1. The interaction log between a user and an answer includes show time and read time. If the read time is 0, it means the user skipped the answer, and we consider this as negative feedback.
    As mentioned in the readme file, "If the user didn't read the answer, read time is set to 0."
    As mentioned in the paper, "Negative feedbacks are like the users delete and skip the answers. ".

  2. Please give me your email address, I will send the config files of recbole to your email.

Thanks for the clarification. It helps a lot!
My email is h12345jack@gmail.com.

If the config files are not complex, it will be helpful for retrieving for others by pasting it in the issues.

Thanks for your advice.
config files.zip
general.yaml is used for Recbole.
zhihu1M.inter is the dataset (suitable for recbole)
If you need to learn more information such as model selection and parameter tuning, please watch this blog of Recbole:
https://blog.csdn.net/Turinger_2000/article/details/110395198
And this is the website of Recbole: https://recbole.io/