Simple-NLP

This is a very simple micromaterial created for the Oxford Summer of Hacks Language Hack Day.

The aim is to give learners practice in doing a very simple NLP task: finding the most frequent words in a text (frequency distribution), and also finding the type/token ratio (number of unique words / number of total words).

learning objectives

what is a type, and what is a token
count the total words (tokens) in a text
converting a text into unique words
count the unique words (types) in a text
calculate the type/token ratio of a text

The activity

One big skeleton function has already been written, along with the test for it. So to complete the activity, just fill in the functions and run the tests. If the test passes, you did it! If not, try to fix the function so the test passes.

to run the test: python -m unittest

lpmi-13/simple-NLP-stats

Simple-NLP

learning objectives

The activity

Possible steps:

1) Find out about types and tokens.

2) Find out about turning text into a list of words

3) Find out about turning a list of words into a list of unique words

4) Keep track of the total words and unique words, then calculate the ratio.