LINKS : https://www.kaggle.com/jealousleopard/goodreadsbooks
INTRODUCTION : The dataset is about different books in goodreads website, their authors, ratings, number of visits. I think we will take different colums and perform mapper and reducer to increase the number of users to website.
DATA SOURCE : This data set is in excel csv format.(Structured) The key attributes we are interested in : Title, Author, Average Ratings. Volume : This data set have 10 columns and 13,720 records(1 MB). Variety : This data set is structured and in excel csv format. Velocity : The velocity is slow because it records data for each person. Veracity : This is a clean data set Volume : I want to find out the best selling authors and most rated books along with most reviewed genres, then i have to find most reviewed authors and highest rated books.
BIG DATA PROBLEMS : 1)Manasa Ginjupalli: for each author, find average rating(average of all his books.)
- Azhar Alali: for each title, I'm going to find the highest rating .
3)Farheen Mohammad
for each author, fnd the count of pages.