/Virtual_Nietzsche

Holiday fun learning about ngrams

Primary LanguageJupyter Notebook

Virtual_Nietzsche

This Virtual Nietzsche is a predictive sentence generator in the style of Friedrich Nietzsche. We have built it from scratch in order to better understand n-grams.

In addition, a classifier is also trained to identify from which of the five books the line is most likely to be found in.

Dataset

We have used a dataset from Kaggle (https://www.kaggle.com/hsankesara/books-of-friedrich-nietzsche). This dataset contains five books written by Nietzsche in the form of a text file.

The contents of the books were extracted and a new file (clean_data.txt) has been generated to compile all the sentences (with [BEGIN] and [END] tags generated to signify the beginning or end of a sentence).

On n-grams

Unigrams: P(W_i)

Bigrams: P(W_i | W_(i-1))

Trigrams: P(W_i | W_(i-1), W_(i-2))