kang205/SASRec

'review' dataset not correct

Cliu2 opened this issue · 2 comments

Cliu2 commented

In the 'steam_reviews' dataset, the 'recommend' property of all samples is 'True'.

Could that be caused by crawling bugs during data collection?

http://cseweb.ucsd.edu/~wckang/steam_reviews.json.gz

Thanks for letting me know this. You're right, all the "recommend" entries are "True", and it's likely due to bugs when parsing the page. I'm not sure if I can re-crawl the data, but I will post the clarification to make others aware of this. Thanks!

Cliu2 commented

Thank you. And one more question, I would like to use the Steam data-set, is there any way I can recover the detailed information of items/users from the ids in the train data file?

https://github.com/kang205/SASRec/blob/641c378fcfac265ea8d1e5fe51d4d53eb892d1b4/data/Steam.txt

(For example, a mapping from the 1-started id to actual user_id/product_id)
Or it would be same helpful if the pre-processing script can be shared.
Thanks in advance.