This repository is not active
ignaciomolina/hbaseTrending
The goal of this assignment is to implement a Java application that stores trending topics from Twitter into HBase and provides users with a set of queries for data analysis. The trending topics to load in HBase are stored into text files with the same format used to store the results of the project 1 assignment. This format was: 1 file per language and each line of the file with the CSV format “timestamp_ms, lang, tophashtag1, frequencyhashtag1, tophashtag2, frequencyhashtag2, tophashtag3, frequencyhashtag3”. The query set is composed by 3 queries: 1. Given a language (lang), do find the Top-N most used words for the given language in a time interval defined with a start and end timestamp. Start and end timestamp are in milliseconds. 2. Do find the list of Top-N most used words for each language in a time interval defined with the provided start and end timestamp. Start and end timestamp are in milliseconds. 3. Do find the Top-N most used words and the frequency of each word regardless the language in a time interval defined with the provided start and end timestamp. Start and end timestamp are in milliseconds.
Java