
Tutorial for beginners, PyCon 2014

Primary LanguageHTML

Python for Social Scientists

title: Python for Social Scientists author: name: Renee Chu twitter: "@reneighbor" output: slides.html controls: true


Python for Social Scientists

Tutorial, PyCon 2014 Montreal


Intended audience:

People learning to code who have completed a Python workshop or an online class



  • python
  • matplotlib
  • numpy
  • text editor


Survey of programming levels in the audience


My background

  • Econ major, liberal arts college
  • No coding at school
  • First job in sales and support
  • Learned coding through workshops (Railsbridge and PyLadies) and online (Stanford Engineering Everywhere, Learn Python the Hard Way)


Advice I received:

Pick a project you're passionate about.


Hacking on data is:

  • Interesting
  • Widely available
  • Easier setup than web


Answer some interesting questions:

Should development resources be spent on family planning or fighting disease?

<iframe width="560" height="315" src="http://www.tubechop.com/watch/2507136" frameborder="0" allowfullscreen></iframe>


What would we need to be able to verify this on an empirical level for at least one country?

  • Childhood mortality across years
  • Births per woman across years


World Bank Data Indicators page


Answer some other interesting questions:

Alt Text


Answer some other interesting questions:

Who has more mobile phone subscriptions per 100 people:

  • Finland
  • United States


Mobile phone subscriptions per 100 people, 1960-2011

Alt Text

Source: World Bank, http://data.worldbank.org/indicator/IT.CEL.SETS.P2


Answer some interesting questions:

Who has more mobile phone subscriptions per 100 people:

  • Finland
  • United States
  • El Salvador


Mobile phone subscriptions per 100 people, 1960-2011

Alt Text


Plans are cheaper in lower-income countries...

  • US: $59.00/mo
  • Finland: $40.10/mo
  • India: $12.90/mo

Unbundled total package (voice+sms+data) plans available to individual consumers. (No El Salvador data available.) Open Technology Initiative (http://newamerica.net/publications/policy/an_international_comparison_of_cell_phone_plans_and_prices)


Prices are much cheaper within-network (incentive for multiple accounts)

Tigo El Salvador "Basic" mobile postpaid, per min:

  • Between Tigo: $0.08
  • Other networks: $0.13
  • Landlines: $0.13
  • To USA/Canda: $0.09



Today we're going to:

  1. Import CSV data into Python
  2. Find a MatPlotLib example
  3. Pipe our CSV data into MatPlotLib


1. Import CSV data into Python


Get started:

In your folder for personal projects:

git clone https://github.com/reneighbor/python-for-social-scientists.git


What's here?


  • data/
    • fertility.csv
    • childhood_deaths.csv
  • read_data.py
  • chart_csv.py


Running a dead-simple program

  • Open up the folder

  • Open up "read_data.py"

  • Type:

    print "Hello world"


Find your new file in your terminal

cd Projects/personal-projects/programming-for-social-scientists
python read_data.py

You should see "Hello World" spit back at you


Follow these steps:

[Erase the print statement]

import csv
  • libraries are a bunch of functions and helpers written by other people


Follow these steps:

import csv

csvfile = open('data/childhood_deaths.csv', 'rU')


Follow these steps:

import csv

csvfile = open('data/childhood_deaths.csv', 'rU')
reader = csv.DictReader(csvfile)
  • DictReader lets you traverse the contents the csv file like a dictionary


This is a Python dictionary

	'Country Name': 'Finland', 
	'Country Code': 'FIN', 
	'2007': '114.924474', 
	'2008': '128.4719884',
	'2009': '144.1530224'


* This is a Python dictionary

	'2007': '114.924474',
	'2009': '144.1530224' 
	'Country Name': 'Finland', 
	'2008': '128.4719884',
	'Country Code': 'FIN',


Follow these steps:

import csv

csvfile = open('data/childhood_deaths', 'rU')
reader = csv.DictReader(csvfile)

for row in reader:
	print row


Follow these steps:

Run it! In your terminal:

cd Projects/personal-projects/programming-for-social-scientists
python read_data.py


Follow these steps:

import csv

csvfile = open('data/childhood_deaths', 'rU')
reader = csv.DictReader(csvfile)

for row in reader:
	if row['Country Name'] == "Finland":
		print row


Any questions?


What are some things we observe about the output?


What are some things we observe about the output?

  • Not sorted
  • Data in strings
  • Every row


2. Find a MatPlotLib example


Find something to copy


Alt Text


Examine the code-- What do we know?

  • First half is drawing the rects, second is labels
  • Values the list


Run the sample code

  • Create a new file "mens_womens.py" inside python_for_social_scientists
  • In your terminal run "python mens_womens.py"
  • Do you get the chart?
  • Edit the values for "menMeans" and "womensMeans". Do you see a change?


MatPlotLib Bar chart simplified

  • Look at basic_chart.py


In your terminal:

python basic_chart.py


In your text editor:

How do we get the other countries data?

  • In basic_chart.py add 2 series, el_salvador_data and usa_data
  • Uncomment the commented-out lines
  • Run it again


Piping in the CSV data

  • open chart-csv.py
  • What needs to be done in order to extract data?


Piping in the CSV data

What needs to be done in order to extract data?

  • Turn data from strings into ints
  • Stip out county names
  • Sort by year


Give it a try

-- Break


Share what you've written


Comparing 2 countries

  • How would you edit chart_csv to graph multiple countries?


Comparing 2 countries

  • run extractData() on 2 other countries
  • comment out the rectangle-drawing for those countries


Comparing 2 indicators

What do we need to do to compare mortality against fertility?


Comparing 2 indicators

What do we need to do to compare mortality against fertility?

  • Take in 2 CSV readers
  • Take in only one country
  • rename "rects" to be indicators instead of countries


From Github:

programming_for_social_scientists compare_indicators_starter.py

  • Re-arranged to accomediate two comparison-drawing functions and a main()


Give it a try

-- El Salvador 1960-2013 Alt Text


Public service announcement:

correlation != causation


What we've done:

  • Imported a CSV and turned it into a dict
  • Went to MatPlotLib and found a bar chart to borrow
  • Drew a series comparing countries from one CSV
  • Drew a series comparing indicators from two CSVs


Your Turn

  • Pick a data set that interests you.
  • Write code to visualize it.
  • Teach us something!