The HBase project at Technical University of Madrid. The goal is to support various top N queries on trending Twitter topics.
- Key - timestamp
- Column family - each row has several languages, such as en, es and it
- Column - each column family has topHashTag1, topHashTag2, topHashTag3, topHashTag1Freq, topHashTag2Freq, topHashTag3Freq
- Given language and time interval ([startTimesatmp, endTimestamp)), performing range scan with column family (language) and using a StreamTopK object to store and retrieve top N words
- Given a list of languages and time interval, performing range scan with column families (languages) and using a set of StreamTopK objects to store and retrieve top N words for each language
- Given time interval, performing range scan without column family and using a StreamTopK object to store and retrieve top N words for all languages
- The key point here is to traverse scan result using keySet() to get column families (all languages) instead of predefined languages
The program was tested in hbase standalone mode and hbase distributed mode with hbase ZooKeeper.