Getting data with Python

A web scraping tutorial

This repository contains materials that I use for teaching basic web scraping and data acquisition topics to non-coding audiences. The core of this workshop is the Getting Data with Python.ipynb notebook. That uses the HTML files stored in wikisource to create the eventual output, all_letters.csv

There is also a bunch of less organized messy stuff in messy folder. Not for the feint of heart, this folder might nonetheless be interesting to someone who wants to mess around with more advanced techniques, like applying Google's natural language cloud processing to this dataset. Basically, this is a lot of unpolished material and some dead ends that may be useful to another person, but is also here so that I don't forget how I did things.