- teacher: Suneel Chakravorty, twitter: @suneelius
- chakravorty@post.harvard.edu guest access for chatroom: https://www.hipchat.com/geDHwPme2
- class chatroom: https://suneelius.hipchat.com/chat
- wifi: GA guest, password: yellowpencil
- review/test/warm up knowledge of lists, for loops, logical control flow, and list comprehensions
- Write a function that takes prints all the even numbers between 1 and 10,000.
- Write a function that returns a list of the numbers between 1 and 10,000 that are divisible by 3.
- The same as 2, but use Python list comprehensions
- Write a function
get_max
that takes a list of numbers and returns the max of those numbers, don't use the builtinmax()
function. Afterward, try usingmax()
- Write a function
is_odd_or_div_by_7
that returns True if a number is odd or divisble by 7 and False otherwise. Then write it using alambda
function. - Use
is_odd_or_div_by_7
and list comprehensions to write a functionget_sublist_of_numbers_odd_or_div_by_7
that takes in a list and returns a sublist of those numbers that are either odd or divisible by 7. - Given a list of food orders, e.g.
["burger", "fries", "burger", "tenders", "apple pie"]
, write a functionget_aggregate_order_counts
that takes the list and returns a dictionary with the different dishes as keys and the number of times they appear in the list as the values. For example, it takes["burger", "fries", "burger", "tenders", "apple pie"]
and outputs{ "burger": 2, "fries": 1, "tenders": 1, "apple pie": 1 }
- Use collections.Counter to achieve the same functionality.
- Write a function
get_most_popular_order_data
that takes a list of orders but instead of returning a dictionary with the counts, it just outputs a tuple: the dish that appears the most in the list and the number of times it appears in the list. So the output given the example would be("burger", 2)
- use csv library to read in data
- use pure Python techniques to extract insights about the data
- start getting acquainted with the Pandas library
- Using csv library, read in data from rock.csv, which you can download here: https://www.dropbox.com/s/cbffxkqq0ujru58/rock.csv?dl=0
- How many songs are from 1981?
- How many songs are from before 1984
- What is the earliest release year in the data? (HINT: You might have to account for/clean up dirty data)
- What are the top 20 songs by play count (HINT: use builtin sorted() function, documentation here: https://wiki.python.org/moin/HowTo/Sorting)
- Who are the top 10 most prolific artists in the data along with the number of their songs that appear in the data?
- How many different artists appear in the data?
- How songs does 'Rock' appear in the title of?
- Instructions for Mac: http://docs.continuum.io/anaconda/install.html#mac-install
- Instructions for Windows: http://docs.continuum.io/anaconda/install.html#windows-install
- Setting up your environment, using the
conda
command line tool:
conda create -n ga-python pandas matplotlib ipython
source activate ga-python
ipython -pylab
>>> import pandas
- read_csv
- .columns
- [:], like list slicing
- .head()/tail()
- filtering
- restricting columns
- add columns
- getting count of rows from .index
- get value_counts
- apply() functions to a column
- string methods
- .plot
- practice using Pandas
- learn how to sort
Download csv from https://github.com/suneel0101/lesson-plan/blob/master/crunchbase_monthly_export.csv
- What are the top 10 highest funded companies?
- How many companies are from New York?
- What are the most popular Markets?
- Plot # of companies against region, limiting to 20 most popular regions