/nlp-for-the-easily-bored

NLP/IE workshop for the Tucson Data Science meetup (6/30/2016)

Primary LanguageJupyter NotebookMIT LicenseMIT

NLP Information Extraction for the easily bored

NLP/IE workshop for the Tucson Data Science meetup (6/30/2016)

Please fork this repository and follow along.

NOTE: this is a work in progress. Check back later for updates...

Table of Contents

NOTE: When viewing the slides, it's easiest to advance using fn+ Down Arrow

  1. NLP Information Extraction for the easily bored
  • slides / notebook
  • How do we get useful things out of a sea of text?
  • Learn about finding people, places, organizations, etc.
  1. Introduction to py-processors
  • slides / notebook
  • An overview of the library for natural language processing (NLP) library we'll be using in the examples

Examples

Here you'll find a few use cases illustrating the concepts covered in the intros.

  1. Who, what, when, and where? Making sense of web-based news
  1. Getting structured information out of Wikipedia pages
  • slides / notebook
  • You now know a little about how to find named entities (people, places, organizations, etc.) in text, but how do these interact in text?
  • Challenge: Try to populate a Wikipedia infobox for Barack Obama.
  1. Movie reviews
  • slides / notebook
  • Is it a positive or negative review? If we don't have a score, can we tell from the review text?
  • NOTE: To really get into this example, you'll need a rotten tomatoes developer key
  • Challenge: Predict critics consensus scores based only on the review text
    • Use whatever method you want
      • feature-based classifier, latent feature model, etc.
    • What works and why?

Installation

There a couple of things you'll need to run the notebooks in this repository...

Requirements

  • Java 8
  • 2 or 3GB of RAM available for running the NLP server

Python dependencies via conda

conda create -n bored python=3
source activate bored
pip install -r requirements.txt

Running the notebooks

The notebooks are all under /notebooks

If you want to run/alter them locally after installing the project dependencies, simply run this command:

jupyter notebook