PyCon UK 2016

Natural Language Processing in 10 Lines of Code

At Cytora we use NLP to extract and analyse plain text to build our structured information product.

This is the repo for our workshop at PyCon UK. In this repository you will find the step by step tutorial from the workshop on some basic Natural Language Processing tasks using spaCy, a powerful (and super fast) NLP library.

Getting started

Clone this repo from GitHub and open the directory, on a UNIX machine these actions will look like this.

git clone https://github.com/cytora/pycon-nlp-in-10-lines.git
cd pycon-nlp-in-10-lines

We recommend you to install all the required dependencies in a virtual environment such as virtualenv, however this step could be skipped.

virtualenv -p python3 venv
source venv/bin/activate

If you are using the Miniconda release of Python, you can use conda virtual environments so your virtual environment setup will be slightly different.

conda create --name venv python=3
source activate venv

To install all the required Python dependencies needed in this tutorial, you need to run this command in the cloned directory:

pip install -r requirements.txt

To install the spaCy model you need to run:

sputnik --name spacy --repository-url http://index.spacy.io install en==1.1.0

To run jupyter notebook run:

jupyter notebook

The tutorial has three parts:

00_spacy_intro.ipynb - Introduction to spaCy
01_pride_and_predjudice.ipynb - Real text analysis (Pride & Predjudice) (blogpost)
02_rand_dataset - Open task on RAND dataset (blogpost)

hkuich/pycon-nlp-in-10-lines

PyCon UK 2016

Natural Language Processing in 10 Lines of Code

Getting started