This repository contains files related to a study of 6 sentiment lexicons used for suicide risk assessment. This includes all code used to extract sentiment words from the eHOST-IT case-control cohort of CRIS clinical notes. The 6 lexicons (not provided) must be downloaded separately. These are:
- AFINN
- Linguistic Inquiry and Word Count (LIWC) 2015 lexicon
- NRC Word-Emotion Association Lexicon (aka EmoLex)
- Pattern lexicon
- Opinion lexicon
- SentiWordNet 3.0
The scripts are as follows:
- emotions.py: code to prepare data and lexicons for experiments and extract sentiment words.
- emotions_afinn.py: code to extract sentiment words using AFINN.
- emotions_emolex.py: code to extract sentiment words using EmoLex.
- emotions_pattern.py: code (Python 2.7) to extract words using the Pattern lexicon.
- emotions_pattern_p36.py: code (Python 3.6) to extract sentiment words from previously tokenised text using Pattern.
- emotions_swn.py: code to extract sentiment words using the NLTK interface for SentiWordNet 3.0.
- sentiment_extraction.py: code to calculate frequency statistics and test cross-corpus statistical significance (Mann-Whitney U Test) of frequency differences.