Maoyan Movie Review Dataset(M-Dataset)

Introduction

  • coming from "GEK: A Spam Movie Review Detection Model Based on Graph Convolutional Neural Network Embedding of Movie Knowledge"

  • M-Dataset contains spam movie review records and non-spam movie review records collected from Maoyan(one of the largest platforms for movie tickets and movie reviews in China).

  • The user may not violate the privacy protection policy of Maoyan.

  • M-Dataset Statistics

    Name #Reviews #Spam #Non-spam #Movie Genre Movie Released Year Spam
    Candidates 734,130 -- -- 2352 40 2017-2021
    M-Dataset 65,696 20092 45604 457 40 2017-2021

Labeling Strategy

  • The content of the review has absolutely nothing to do with the movie being reviewed;

  • If the evaluation and rating do not match, the content of the review is rated as a high score for criticism, and the content of the review as a low score for praise;

  • The review description is exaggerated and overly praised, full of empty adjectives and pure warm praise, without any shortcomings;

  • The content of the reviews is unified, and there are a lot of similar reviews in other movie reviews;

  • The review that show too high or too low ratings simply because they like or hate a certain movie star or character

We consider spam reviews as long as they meet one or more of the above criteria.

Details

  • The detailed information fields of a review include:

    it will be updated soon...

  • The detailed information fields of a movie introduction include:

    it will be updated soon...

  • The detailed information fields of a long comment include:

    it will be updated soon...