/Automated-Sentence-Generator

Automated Sentence Generation using Dictionary and Hashing based on Markovian processes

Primary LanguageJupyter Notebook

Automated-Sentence-Generation-using-Dictionary-and-Hashing

Automated Sentence Generation using Dictionary and Hashing based on Markovian processes

This is a CS110 assignment. The bot will take an initial string with n number of words as well as a word count parameter wordcount and produce a text with the given word count. The motivation for this project is the fact that autogenerated texts can be used in a variety of fields, from fillers to SEO rank boosting content. In addition to that, many applications use autocompletion to help people type faster by suggesting the most relevant word in a given context.

We will use Python dictionary and custom hash tables because these data structures are highly relevant for our purpose. They allow an easy, fast, and efficient mapping between keys and values. For a dictionary, a single mapping takes $O(1)$ time, and finding a value given the key also takes $O(1)$ time. Which is great, since our generator is going to constantly and iteratively find values given a key. I looked into the resource provided by prof. Ribeiro in my proposal feedback, and concluded that my algorithm is very much similar to an n-gram model. In this case, we will have a prefix (two words) as a key and a list of words as a value. For any given prefix, the algorithm will find the appropriate slot where the key is the prefix and then will access the value, which is a list. Inside this list is the pool of all the words that appeared after the given prefix in our original text ("The World as Will and Representation" by Arthur Schopenhauer). The algorithm will randomly take a word from the list and add it to our generated text. Since the sampling is done randomly, the words that appeared more in the original text have higher chances of being sampled; therefore, more relevant words will be suggested for a given prefix.