/simple-NLP-stats

a simple micromaterial to help learners practice doing simple NLP tasks

Primary LanguagePython

Simple-NLP

This is a very simple micromaterial created for the Oxford Summer of Hacks Language Hack Day.

The aim is to give learners practice in doing a very simple NLP task: finding the most frequent words in a text (frequency distribution), and also finding the type/token ratio (number of unique words / number of total words).

learning objectives

  • what is a type, and what is a token
  • count the total words (tokens) in a text
  • converting a text into unique words
  • count the unique words (types) in a text
  • calculate the type/token ratio of a text

The activity

One big skeleton function has already been written, along with the test for it. So to complete the activity, just fill in the functions and run the tests. If the test passes, you did it! If not, try to fix the function so the test passes.

to run the test: python -m unittest

Possible steps:

1) Find out about types and tokens.

2) Find out about turning text into a list of words

3) Find out about turning a list of words into a list of unique words

4) Keep track of the total words and unique words, then calculate the ratio.