This project aims at utilizing pycrfsuite to label utterances with corresponding dialog acts.
The training set of around 900 CSV files with dialogues were provided to the crfsuite model.
- Marker if speakers changed between utterances
- Feature for every token in the utterance
- Feature for every token's POS tag in the utterance
In order to increase accuracy, an advanced feature list with the following features was used:
- Token for every consecutive pair of words
- Token for commonly occuring letter pairs such as "th", "he", "in"
The accuracy % increase noticed was 1.3%.