/RecommenderSystem-DataSet

This repository contains some datasets that I have collected in Recommender Systems.

Recommender System DataSet

These datasets are very popular in Recommender Systems which can be used as baseline.

  • Douban This is the anonymized Douban dataset contains 129,490 unique users and 58,541 unique movie items.
  • Epinions Epinions is a website where people can review products.
  • Flixster Flixster is a social movie site allowing users to share movie ratings, discover new movies and meet others with similar movie taste.
  • CiaoDVD CiaoDVD is a dataset crawled from the entire category of DVDs from the dvd.ciao.co.uk website in December, 2013
  • MACLab With the text in the post, the mood tag, and the music title, this project is aimed at studying the users' moods and music emotions.
  • DEAPdataset A dataset for emotion analysis using eeg, physiological and video signals.
  • MyPersonalityDataset myPersonality was a popular Facebook application that allowed users to take real psychometric tests, and allowed us to record (with consent!) their psychological and Facebook profiles. Currently, our database contains more than 6,000,000 test results, together with more than 4,000,000 individual Facebook profiles.
  • Bibsonomy Tag Recommendations in Social Bookmarking Systems.
  • Delicious plista News Recommendation Dataset and Delicious.
  • Movielens Stable benchmark dataset. 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users. Includes tag genome data with 12 million relevance scores across 1,100 tags.
  • Jester Anonymous Ratings from the Jester Online Joke Recommender System.
  • BookCrossing Book-Crossing Dataset.
  • LastFM 92,800 artist listening records from 1892 users.
  • Wikipedia Wikipedia offers free copies of all available content to interested users. These databases can be used for mirroring, personal use, informal backups, offline use or database queries.
  • OpenStreetMap The files found here are complete copies of the OpenStreetMap.org database, including editing history. These are published under an Open Data Commons Open Database License 1.0 licensed. For more information.
  • PythonGitCode Hermes is Lab41's foray into recommender systems. It explores how to choose a recommender system for a new application by analyzing the performance of multiple recommender system algorithms on a variety of datasets.
  • Gist Recommendation and Ratings Public Data Sets For Machine Learning.
  • Yelp The Yelp dataset is a subset of our businesses, reviews, and user data for use in personal, educational, and academic purposes. Available in both JSON and SQL files, use it to teach students about databases, to learn NLP, or for sample production data while you learn how to make mobile apps
  • AmazonReviews This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014. This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).
  • CiteULike The CiteULike database is potentially useful for researchers in various fields. Physicists and computer scientists have expressed an interest in trying to analyse the structure of the data, and frequently ask for datasets to be made available. Previously this was done on an ad-hoc basis, and it relied on us remembering to update the data file. Now, there is an automatic process which runs every night producing a snapshot summary of what articles have been posted with which tags.
  • Taobao The data set contains anonymized users' shopping logs in the past 6 months before and on the "Double 11" day,and the label information indicating whether they are repeated buyers. Due to privacy issue, data is sampled in a biased way, so the statistical result on this data set would deviate from the actual of Tmall.com.
  • wait to update

Below is the table of some statistics of above datasets.

Data Set Users Items Ratings (Scale) Density Users Links (Type)
Ciao 7,375 99,746 278,483--[1, 5] 0.0379% 7,375 111,781--Trust
Douban 129,490 58,541 16,830,839--[1, 5] 0.222% 129,490 1,692,952--Friendship
Epinions (665K) 40,163 139,738 664,824--[1, 5] 0.0118% 49,289 487,183--Trust
Epinions (510K) 71,002 104,356 508,960--[1, 5] 0.00687% Trust
Epinions (Extended) 120,492 755,760 13,668,320--[1, 5] 0.015% Trust Distrust
Flixster 147,612 48,794 8,196,077--[0.5, 5.0] 0.1138% 787,213 11,794,648--Friendship
FilmTrust 1,508 2,071 35,497--[0.5, 4.0] 1.14% 1,642 1,853--Trust
Jester 59,132 140 1,761,439--Explicit 21.28%
MovieLens 100K     943   1,682 100,000--[1, 5]     6.30%                    
MovieLens 1M       6,040 3,706 1,000,209--[1, 5] 4.47%                    
MovieLens 10M     71,567 10,681 10,000,054--[1, 5] 1.308%