A project for CS8803-CSS at Georgia Tech by Andrew Dai and Taha Merghani replicating Document-level Sentiment Inference with Social, Faction, and Discourse Context by Choi, Eunsol and Rashkin, Hannah and Zettlemoyer, Luke and Choi, Yejin.
bibtex
@InProceedings{Choi:2016:ACL,
author = {Choi, Eunsol and Rashkin, Hannah and Zettlemoyer, Luke and Choi, Yejin},
title = {Document-level Sentiment Inference with Social, Faction, and Discourse Context},
booktitle = {Proceedings of the ACL},
year = {2016},
publisher = {Association for Computational Linguistics}
}
- Python 3
- virtualenv
- Jupyter Notebook
- Stanford CoreNLP
- Scipy
- CPLEX4 (Community edition)
- MPQA
- Data from authors (publically available and privately shared)
(Optional) Set up a virtual environment
$ virtualenv -p python3 venv
$ source venv/bin/activate
Install (Python) dependencies
$ pip install -r requirements.txt
Start Jupyter notebook
$ jupyter notebook
The paper introduces a:
document-level ILP that includes base models and soft social constraints
TODO
- overall formula (social + faction + all pairwise)
- faction inference (soft constraint) (Section 2.1)
- input: entity pairwise faction extracted with base model described in 3.2
- sentiment relations (Section 2.2)
- input: entity pairwise sentiments extracted with base model described in 3.1
- balance theory constraints
- reciprocity contraints
The global model in Sec. 2 uses two base models, one for pairwise sentiment classification and the other for detecting faction relationships.
The input is plain text and no gold labels are assumed; entity detection, dependency parse and co-reference resolution are automatic, and include common nouns and pronoun mentions.
It predicts sentiment between entity-pairs:
sent(e_i→e_j)∈{positive, unbiased, negative}
.
The authors "trained separate classifiers for pairs that co-occur in a sentence and those that do not, using a linear class-weighted SVM classifier with crowd-sourced data...".
define the sentiment label for the text to be positive if it contains more words that appear in the positive sentiment lexicon than that appear in the negative one (and similarly for the negative label). We used the MPQA sentiment lexicon
- Sentiment labels for:
- Paths containing
dobj
andnsubj_rev
, length <= 3 if path contains sentiment lexicon words - Paths
e_i ↑ nsubj ↓ ccomp ↓ nsubj ↓ e_j
(if exists) - Paths without any named entity
- Paths containing
- Indicator for
nmod:against
- NER (Named Entity Recognizer) types
- Percentage of sentences with entity co-occurance
- Mentioned in the headline
- Appear only once in the document
- Add document sentiment when both entities are most frequent entities
- Rank of number of mentions of holder and target
(
e_i
ande_j
respectively), when they never co-occur in any sentences
- Direct quotations
- Extracted with regular expressions.
- Sentiment label of quote applied to (speaker, entities in quote), excluding entities with less than 3 occurances
- Sentiment label also added to (speaker, most frequent entity)
- Indirect quotations
- Connect speaker and quotation using "list of 20 verbs indicating speech events"
- "Sentiment label of words connected to
e_j
via dependency path of length up to two that also includes the subject of the quotation verb toe_j
"
- Indicator for whether
e_i
is the subject of the quotation verb
Entity is marked as a faction if the dependency path between them
- "contains only one link of modifier or compound label (
nmod
,nmod:poss
,amod
,nn
, orcompound
)" - "contains less than three links and has a possessive or appositive label (
poss
orappos
)"
This is an "important area for future work"