/nmlpg

Primary LanguageHTML

NMLPG Reading Group

Repository of example code, notebooks, data, etc. for the NMLPG "reading group" at Narrative Science.

Getting started

Dependencies

We assume you're running Python 3.6+. requirements.txt has most of the dependencies you'll need to run any code you find here. Just do the usual:

pip install -r requirements.txt

One extra piece of setup is to install any spacy models you'll need. Since we'll be working with word vectors, I would go ahead and install the one with the most extensive vector information:

python -m spacy download en_core_web_lg

This downloads the model to disk and is a one-time operation. You'll then be able to set up a Spacy parser by running:

import spacy
nlp = spacy.load('en_core_web_lg')

See here for more details on spacy models.

If you want to run LargeVis, you'll have to install and run that separately in a Python 2 environment following the instructions here. Specific examples in the relavant notebook(s) folder explain the details.

Running notebooks

To run the notebook files, just navigate to the notebooks directory and run jupyter notebook from the command line. This will open the Jupyter UI in your default web browser, and you should be able to access any of the notebooks from there.

Datasets

Amazon review data

Thanks to Julian McAuley for sharing this data. From the dataset homepage:

This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014...includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).

See the "Introduction to Amazon review data" for an overview of the data format and how we're working with it.