/south_park_data

Software corresponding to the blog post

Primary LanguagePythonMIT LicenseMIT

south_park_data

Python scripts for the corresponding blog post: http://vprusso.github.io/blog/2017/data-driven-south-park/

Overview

This repository consists of 70,000 lines of dialog extracted from "South Park" (courtesy of Ksenia Sukhova over at Kaggle) as well as two Python scripts:

  1. south_park_analyze.py A number of functions to determine how often a given word or series of words is used by specific character given a set of seasons / episodes.

  2. south_park_plots.py All of the plots generated in the blog post may be replicated by this file.

Usage

Example usage of south_park_analyze.py is shown at the bottom of the file. Uncommenting the following line:

# Question: How many times is the word “dude” said in all of the seasons 1 to 18:
#print ( word_count_by_season_and_episode(df, word="dude") )

will print out how many times the word "dude" is said in "South Park" for seasons 1 through 18. More example usage can be found on the corresponding blog post.

Example usage of south_park_plots.py is shown at the bottom of the file. Uncommenting the following line:

# plot_swear_count_season_frequency()

will generate the following plot

Plot for occurence of swear word by South Park season.

For more information, consult the blog post.

Dependencies

  1. Python 3,
  2. pandas,
  3. nltk.