Big-Data-Project

GOODREADS ANALYSIS

Azhar Alali

Manasa Ginjupalli

Farheen Mohammad

LINKS : https://www.kaggle.com/jealousleopard/goodreadsbooks

INTRODUCTION : The dataset is about different books in goodreads website, their authors, ratings, number of visits. I think we will take different colums and perform mapper and reducer to increase the number of users to website.

DATA SOURCE : This data set is in excel csv format.(Structured) The key attributes we are interested in : Title, Author, Average Ratings. Volume : This data set have 10 columns and 13,720 records(1 MB). Variety : This data set is structured and in excel csv format. Velocity : The velocity is slow because it records data for each person. Veracity : This is a clean data set Volume : I want to find out the best selling authors and most rated books along with most reviewed genres, then i have to find most reviewed authors and highest rated books.

BIG DATA PROBLEMS : 1)Manasa Ginjupalli: for each author, find average rating(average of all his books.)

  1. Azhar Alali: for each title, I'm going to find the highest rating .

3)Farheen Mohammad
for each author, fnd the count of pages.