Personality and Stylistic Scoring of Retrieved Utterances
This is a project for personalizing the utterances of a conversational agent (SlugBot for UCSC).
Work done
- Extracted utterances of characters from Friends and The Big Bang Theory.
- Extracted LIWC, n-gram, POS features from utterances.
- Trained and tested a classification model in the extracted features.
Prerequisites
For now, you need just NLTK to run the above codes. Please use Python 3.5+.
How to run the above codes? (Will provide a detailed guide later )
Clone the repository to your machine.
The file clean.py is for cleaning the original dataset provided. The results from the code are stored in each character file named Chandler_all.txt, Ross_all.txt etc. You can run it by,
py clean.py
The other files available now for feature extraction are extract_pos_bigrams.py (for extracting POS bigrams)