README

This repo includes python 2.7 and python 3.5 code used to generate a dataset from scratch. It also includes data at various stages of development (raw, messy, clean), all found in the data folder. In addition, there is a pickle_files folder that contains, you guessed it, pickle files. All of this has been done for python 2.7 and python 3.5.

Note: without setting out to explore this, I discovered that python 3's pickling process is roughly 3-4 times more efficient (at least with my files) than python 2. Yet another reason to migrate to python 3!