NLP/IE workshop for the Tucson Data Science meetup (6/30/2016)
Please fork this repository and follow along.
NOTE: this is a work in progress. Check back later for updates...
NOTE: When viewing the slides, it's easiest to advance using fn
+ Down Arrow
NLPInformation Extraction for the easily bored
- slides / notebook
- How do we get useful things out of a sea of text?
- Learn about finding people, places, organizations, etc.
- Introduction to
py-processors
- slides / notebook
- An overview of the library for natural language processing (NLP) library we'll be using in the examples
Here you'll find a few use cases illustrating the concepts covered in the intros.
- Who, what, when, and where? Making sense of web-based news
- slides / notebook
- go from
html
-> people, places, etc. - Learn how to do basic IE on an article you may have read from The Guardian
- Challenge: How do we disambiguate organizations and people?
- Getting structured information out of Wikipedia pages
- slides / notebook
- You now know a little about how to find named entities (people, places, organizations, etc.) in text, but how do these interact in text?
- Challenge: Try to populate a Wikipedia infobox for Barack Obama.
- Movie reviews
- slides / notebook
- Is it a positive or negative review? If we don't have a score, can we tell from the review text?
- NOTE: To really get into this example, you'll need a rotten tomatoes developer key
- Challenge: Predict critics consensus scores based only on the review text
- Use whatever method you want
- feature-based classifier, latent feature model, etc.
- What works and why?
- Use whatever method you want
There a couple of things you'll need to run the notebooks in this repository...
- Java 8
- 2 or 3GB of RAM available for running the NLP server
Python dependencies via conda
conda create -n bored python=3
source activate bored
pip install -r requirements.txt
The notebooks are all under /notebooks
If you want to run/alter them locally after installing the project dependencies, simply run this command:
jupyter notebook