/mann_ki_baat

Analysis of Indian PM Modi's Mann ki Baat's text since the Coronavirus pandemic started.

Primary LanguageJavaScriptMIT LicenseMIT

mann_ki_baat

Indian PM Narendra Modi interacts with the public through an hour or so long radio programme called Mann ki Baat, translated to Inner Thoughts, or Heart's Talk to be literal.

The first case in India was reported on 30th January 2020, and until the next Mann ki Baat episode on 23rd February 2020, India had only 3 coronavirus cases so there was no real need to focus on this. However from that point onward, the number of cases started growing rapidly in India and by the next episode, India had almost 1000 cases warranting an action from the government.

This code cleans the data and prepares the word cloud for the programme to see what topics were most talked about as we progressed into the year. The code removes special characters, numbers, extra spaces, and most common and not-so-relevant words from the speech, turns it into lower case, and proceeds to add the text into the text area a few lines at a time to create a sense changing topics being talked about as the programme progresses.

To show different perspectives, there are two word clouds. One combines the text for all programmes since March and creates an animtion of word cloud. Another one builds the same word cloud for individual episodes so we can see what the trend was on the whole as well as over time as the cases increased.

Watch the videos here:

File Structure

|____raw/ Contains text taken verbatim from pmindia.gov.in

|____clean_1/ Text with hindi words, headings from dialgues, etc removed manually

|____clean_2/ Text with special characters, numbers, repeat characters etc removed

|____clean_3/ Text with small words, along with some other, removed

|____blocklist.txt List of words not small but insignificant to get the topic

|____safelist.txt List of important words (contextual, nounds etc) to keep

|____text_cleaner.js JS file that does the actual cleaning from raw/ folder until clean_3/

|____app.js Sample server to create a webpage where the words can be animated in word cloud

|____combine_divide_animate.js takes text of all speeches and animates them over time

|____divide_animate_single.js Open the next file and start creating its word cloud

Data Sources

Disclaimer

I've tried to be as unbiased as possible, but because I'm cleaning the data, choosing the words to add or remove manually, there's likely a bias in there. Please feel free to open a pull request to improve this tool in any way.