Python scripts for the corresponding blog post: http://vprusso.github.io/blog/2017/data-driven-south-park/
This repository consists of 70,000 lines of dialog extracted from "South Park" (courtesy of Ksenia Sukhova over at Kaggle) as well as two Python scripts:
-
south_park_analyze.py A number of functions to determine how often a given word or series of words is used by specific character given a set of seasons / episodes.
-
south_park_plots.py All of the plots generated in the blog post may be replicated by this file.
Example usage of south_park_analyze.py
is shown at the bottom of the file. Uncommenting the following line:
# Question: How many times is the word “dude” said in all of the seasons 1 to 18:
#print ( word_count_by_season_and_episode(df, word="dude") )
will print out how many times the word "dude" is said in "South Park" for seasons 1 through 18. More example usage can be found on the corresponding blog post.
Example usage of south_park_plots.py
is shown at the bottom of the file. Uncommenting the following line:
# plot_swear_count_season_frequency()
will generate the following plot
For more information, consult the blog post.