op_spam_AI

AI-generated reviews mirroring the famous Myle Ott's opinion spam dataset This dataset can be used to see if it's possible to distinguish automatically generated reviews from original ones (and fake ones written by humans).

Overview

This corpus consists of AI-generated hotel reviews of 20 Chicago hotels. These reviews were created by feeding to GPT-3 (or Llama) the first 10 words of truthful reviews from the Myle Ott Opinion Spam corpus and asking GPT-3 to complete for a length comparable to the length of the original review.

For the original Myle Ott corpus please download it here: https://myleott.com/op_spam_v1.4.zip

This corpus contains:

400 generated positive reviews
400 generated negative reviews

Each of the above datasets consist of 20 reviews for each of the 20 most popular Chicago hotels. The files are named according to the following conventions:

Files are named according to the format used in the Myle Ott corpus: %c_%h_%i.txt, where:

%c denotes the class: (t)ruthful, (d)eceptive or (g)enerated (in our case all are g)

%h denotes the hotel:

affinia: Affinia Chicago (now MileNorth, A Chicago Hotel)
allegro: Hotel Allegro Chicago - a Kimpton Hotel
amalfi: Amalfi Hotel Chicago
ambassador: Ambassador East Hotel (now PUBLIC Chicago)
conrad: Conrad Chicago
fairmont: Fairmont Chicago Millennium Park
hardrock: Hard Rock Hotel Chicago
hilton: Hilton Chicago
homewood: Homewood Suites by Hilton Chicago Downtown
hyatt: Hyatt Regency Chicago
intercontinental: InterContinental Chicago
james: James Chicago
knickerbocker: Millennium Knickerbocker Hotel Chicago
monaco: Hotel Monaco Chicago - a Kimpton Hotel
omni: Omni Chicago Hotel
palmer: The Palmer House Hilton
sheraton: Sheraton Chicago Hotel and Towers
sofitel: Sofitel Chicago Water Tower
swissotel: Swissotel Chicago
talbott: The Talbott Hotel

%i serves as a counter to make the filename unique

The paraphrased.zip file contains the same GPT-3 reviews paraphrased via the T5 paraphraser: https://huggingface.co/humarin/ chatgpt_paraphraser_on_T5_base

References

[1] M. Ott, Y. Choi, C. Cardie, and J.T. Hancock. 2011. Finding Deceptive Opinion Spam by Any Stretch of the Imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies.

[2] M. Ott, C. Cardie, and J.T. Hancock. 2013. Negative Deceptive Opinion Spam. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

dbuscaldi/op_spam_AI

op_spam_AI

Overview

References