This is an interface to a pre-trained formality, informativenes, and implicature classifier trained on the SQUINKY!
corpus. The SQUINKY!
corpus is a collection of 7,032 sentences annotated manually for formality, informativeness, and implicature. For details on the corpus and the annotation process, please see Lahiri (2015)
:
Lahiri, S. (2015). SQUINKY! A Corpus of Sentence-level Formality, Informativeness, and
Implicature. Ann Arbor, 1001, 48109. https://arxiv.org/pdf/1506.02306.pdf
Each of the three annotations have their own logistic regression classifier trained on various syntactic features of natural language, borrowed from the (unrelated) thesis by Vincze (2015)
. Please consult the thesis for details on feature selection/generation:
Vincze, V. (2015). Uncertainty detection in natural language texts (Doctoral dissertation, szte).
This interface outputs probabilities of the positive and negative classes, for each of the three annotations, for a given sentence. For example, given the sentence "A BIG THANKYOU GOES TO holli!", the output will be:
({'formal': 0.0041114378047021338, 'informal': 0.99588856219529787},
{'informative': 0.011593792054814324, 'ambiguous': 0.98840620794518563},
{'implicative': 0.95996335945188804, 'verbose': 0.040036640548111957})
It is strongly recommended that you read Lahiri (2015)
before attempting to interpret the results -- informativeness and implicature are complicated concepts and their meaning should not be assumed.
sudo pip3 install squinky
# Train the classifiers using the provided training data.
squinky train /path/to/data.csv
# Validate the precision, recall, and f1-score for the provided training data
# using a 25% train/test split.
squinky validate --split=0.25 /path/to/data.csv
# Predict the Formality, Informativeness, and Implicature of the given sentence.
squinky predict "This is a test sentence."
The formality, informativeness, and implicature classifiers have the following precision, recall, and f1-scores:
Precision Recall F1-Score
Formality: 0.82 0.82 0.82
Informativeness: 0.84 0.84 0.84
Implicature: 0.60 0.60 0.60
Lahiri (2015) provided a set of sample sentences with formality, informativeness, and implicature annotations. These classifiers have been validated against those examples. Examples [3]
and [6]
fail for informativeness and implicature, respectively.
Expected | Predicted | ||||||
---|---|---|---|---|---|---|---|
Example from Lahiri (2015) | FORM | INFO | IMPL | FORM | INFO | IMPL | |
[1] | A BIG THANKYOU GOES TO holli! | Low | Low | - | Low | Low | High |
[2] | As Maoists menace continued to be unabated, the government is all set to launch the much-awaited full-fledged anti-Naxal operations at three different areas, considered trijunctions of worst Naxal-affected states. | High | High | - | High | High | Low |
[3] | 4) 'We find no clear relation between income inequality and class-based voting.' | High | Low | - | High | High | High |
[4] | 2) Just wipe the Mac OS X partition when u install the dapper. | Low | High | - | Low | High | Low |
[5] | alright, well, i guess i just made a newbie mistake. | Low | - | High | Low | Low | High |
[6] | All seven aboard the Coast Guard plane are stationed at the Coast Guard Air Station in Sacramento, Calif., where their aircraft was based. | High | - | High | High | High | Low |
[7] | Maoists sabotaged Essar's 166-mile underground pipeline, which transfers slurry from one of India's most coveted iron ore deposits to the Bay of Bengal. | High | - | Low | High | High | Low |
[8] | Wait. | Low | - | Low | Low | Low | Low |