In this project, we create a Tagcloud visualization based on Google Books Ngram. The main idea is to modify word size based on chronological relevance [1], and to display word polarity (positive or negative sentiments) with colours. Furthermore, we use the following libraries and dataset:
- To obtain chronological data, we use Google Books Ngram [2], which allows us to download data through a simple API.
- To obtain the TagCloud visualization, we made a web application in HTML5 and Javascript. Mainly, we use D3.js [3] in combination with a word cloud library [4], which is based on D3’s Force layout [5].
- To color the words based on their positivity and negativity, we use a Python library for Sentiment Analysis [6] that makes use of WordNet [6] (a large lexical database of English). The sentiment function returns word polarity value, which is between -1, very negative, and 1, very positive.
- To generate the data for the visualization, we use a simple PHP server. Through the PHP exec() function, we invoke the Python scripts, and then we preprocess the data. Finally, the server comunicates to the client, sending a JSON file with the data ready to be visualized.
- Python 2.* or more
- Pandas Py Lib
- Pattern Py Lib
- PHP Sever