/BigDataProject

Big Data course project - UNITN

Primary LanguageJavaGNU General Public License v3.0GPL-3.0

BigDataProject

Social (Twitter) Data analysis of User Profiling

Create a mapreduce spark distributed program that processes a large twitter dataset and generates a set of people profiles. A profile is a vector of terms for every user. Tweets need to possibly enhanced or cleaned, then clustered and then profiles be generated for the users. The main point of the project is the creation of the distributed task for the tweet processing in spark. Some techniques from the literature will be provided.