Interpretable Wordembedding Classifiers

Introduction

Wordembeddings can be increadibly useful as features in classification models. Word embeddings are dense low dimensional representations of words, and are created in such a way that words used in similar context receive a similar mathematical representation. Unlike bag-of-words approaches, wordembeddings can capture semantic relationships between words and are fairly light-weight compared to embeddings from more complex models like ELMO and BERT.

Despite their usefulness and high performance, wordembeddings can often be difficult to use in classification tasks where it is vital that researchers are able to explain the data generating process, not just provide an accurate prediction. The dimensions of the word embeddings are in and of themselves not interpretable, and therefore researchers often shy away from using them.

In this notebook I will present a simple framework to add interpretability to classifier models based on word embeddings, and a simple process for working our way back from predictions made by a classifier that takes wordembeddings as input to individual words or terms that the model seem to be using to discriminate your classes.

The approach can fairly simply be summarised as follows:

Train a classifier that takes document level embeddings as input.
Create a vocabulary matrix with ndividual word embeddings for all the words found in your corpus.
Run the vocabulary matrix through the trained algorithm and pick out the individual words the algorithm feels the most certain belong to each class.

In the parts that follow I will demonstrate how this can be done by training a classifier to distinguish whether a newspaper headline came from a politics article or a sports article.

Get News Headlines from The Guardian API

We firstly, need to get our data. The Guardian has a really good API for this purpose; so we can pull out all politics and sports articles from the Guardian between two arbitary points in time, say march and may 2018.

from GuardianAPI import GuardianAPI
import pandas as pd

gapi = GuardianAPI()

api_key: ········

sports_json = gapi.get_content({'section':'sport', 'from-date':'2018-03-1', 'to-date':'2018-05-30', 'code':'uk', 'page-size':50})

politics_json = gapi.get_content({'section':'politics', 'from-date':'2018-03-1', 'to-date':'2018-05-30', 'code':'uk', 'page-size':50})

df = pd.DataFrame(sports_json + politics_json)
df.sample(5)

	id	type	sectionId	sectionName	webPublicationDate	webTitle	webUrl	apiUrl	isHosted	pillarId	pillarName
401	sport/2018/may/06/justify-kentucky-derby-mende...	article	sport	Sport	2018-05-06T00:08:06Z	Justify reigns in the rain at Kentucky Derby a...	https://www.theguardian.com/sport/2018/may/06/...	https://content.guardianapis.com/sport/2018/ma...	False	pillar/sport	Sport
26	sport/2018/may/28/marco-trungelliti-road-trip-...	article	sport	Sport	2018-05-28T21:13:10Z	Marco Trungelliti makes 500-mile family road t...	https://www.theguardian.com/sport/2018/may/28/...	https://content.guardianapis.com/sport/2018/ma...	False	pillar/sport	Sport
2207	politics/2018/mar/11/labour-mps-should-not-app...	article	politics	Politics	2018-03-11T11:15:13Z	Labour MPs should not appear on Russia Today, ...	https://www.theguardian.com/politics/2018/mar/...	https://content.guardianapis.com/politics/2018...	False	pillar/news	News
2205	politics/2018/mar/11/scottish-labour-conferenc...	article	politics	Politics	2018-03-11T15:43:53Z	Scottish Labour conference backs Corbyn on Brexit	https://www.theguardian.com/politics/2018/mar/...	https://content.guardianapis.com/politics/2018...	False	pillar/news	News
316	sport/2018/may/11/guildford-extend-record-run-...	article	sport	Sport	2018-05-11T18:28:33Z	Chess: Guildford extend their record unbeaten ...	https://www.theguardian.com/sport/2018/may/11/...	https://content.guardianapis.com/sport/2018/ma...	False	pillar/sport	Sport

Pre-Trained Word Embeddings

We now need to get pre-trained word embeddings from gensims api and then create an embedding that represent each headline, by taking the average embedding of all the words used in each headline.

# Call in depenencies
import gensim.downloader as api
import re
import numpy as np
from nltk.corpus import stopwords

# Get glove embeddings
glove_vectors = api.load('glove-wiki-gigaword-300')

# Get a set of stopwords
StopWords = set(stopwords.open('english'))

def text_preprocess(text, word_vectors, StopWords):
    tmp_txt = text.lower() # lowercase all text
    tmp_txt = re.sub(r'\!', ' ! ', tmp_txt) # add a a space to each exclamation mark to ensure they get a token
    tmp_txt = re.sub(r'\?', ' ? ', tmp_txt) # add a a space to each question mark to ensure they get a token
    if stopwords:
        tokens = [w for w in tmp_txt.split() if w in word_vectors.vocab and w not in StopWords]
    else:
        tokens = [w for w in tmp_txt.split() if w in word_vectors.vocab]
    return tokens

def doc_vector(tokens, word_vectors):
    doc_vec = np.mean(np.array([word_vectors[word]
                                 for word in tokens]), axis = 0)
    return doc_vec

def concat_texts(corpus, word_vectors):
    out_matrix = np.empty((len(corpus), 300), float)
    for row,txt in enumerate(corpus):
        out_matrix[row] = doc_vector(txt, word_vectors)
    return out_matrix

headlines_processed = [text_preprocess(x, glove_vectors, StopWords) for x in df.webTitle]

headline_matrix = concat_texts(headlines_processed, glove_vectors)
headline_matrix.shape

(2293, 300)

Split data into Training and Test Set

# Call dependency
from sklearn.model_selection import train_test_split

# Split data into train and test
X_train, X_test, y_train, y_test = train_test_split(headline_matrix, 
                                                    df.sectionId, 
                                                    test_size=0.2, 
                                                    random_state=123)

print('Training Examples: ', X_train.shape, "\n"
      'Test Examples: ', X_test.shape)

Training Examples:  (1834, 300) 
Test Examples:  (459, 300)

Get Vocabulary Matrix

We then need to get a matrix of individual words used in the model, that we can use to work our way back from a model prediction to a set of terms that explain those predictions. We can do this by tokenising all the data in our corpus and put these tokens into a set.

vocab = set()
for headline in df.webTitle:
    vocab.update([w.lower() for w in headline.split() if w.lower() in glove_vectors.vocab])
vocab = list(vocab)

vocab_matrix = np.array([glove_vectors[w] for w in vocab])

Train Models

from sklearn.metrics import classification_report
import matplotlib.pyplot as plt

Support Vector Machine

Support Vector Machines attempts to find a line or hyperplane that separates the datapoints from each class with the largest possible margin.

One way to understand what our model is picking up on is to calculate the distance of each of the words in our vocabulary to the decision plane. Words that are far from the decision plane are presumably more discriminative, and should therefore give us a clue as to what the model is picking up on. Similarly, words that are very close to the separating line should be words that are not discriminatory; meaning either general vocabulary or terms typically used in both domains.

We can get the distance of each word from the decision boundary by taking the decision function value we get from running the word through the trained algorithm divided by the l2 norm of the coefficient vector.

Below is a two-dimentional illustration of what we expect to see, but the intuition generalises to multiple dimensions

Train and Fit Model

from sklearn.svm import SVC

svm = SVC(kernel = 'linear', C = 1, class_weight='balanced')

svm.fit(X_train, y_train)

SVC(C=1, break_ties=False, cache_size=200, class_weight='balanced', coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

Predict on unseen data

y_pred = svm.predict(X_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

    politics       0.91      0.97      0.94       158
       sport       0.98      0.95      0.97       301

    accuracy                           0.96       459
   macro avg       0.95      0.96      0.95       459
weighted avg       0.96      0.96      0.96       459

Calculate distance from each word to the decision boundary

dist = svm.decision_function(vocab_matrix) / np.linalg.norm(svm.coef_)

distance_df = pd.DataFrame({'word':vocab, 'distance':dist})

Sort Vocabulary Words by distance to the decision boundary

distance_df.sort_values('distance', ascending = False).head(10)

	word	distance
2582	sevens	2.561792
82	a-league	2.385136
2350	rugby	2.384202
1203	nrl	2.268408
1135	paralympic	2.252786
213	championship	2.213799
2177	cycling	2.199589
4190	tennis	2.159167
3435	tournament	2.144038
755	squad	2.125921

We can clearly see that the words that are the furthest from the decision boundary on the positive side are all sports related words like rugby, paralympic and tournament. This suggest that these words and words similar to them are particlarly important for the model to recognise a headline as a sports headline.

When looking at the words that are the furthest away from the decision boundary on the negative side, these are, as suspected political terms like minister, labour and parliament. Again this tells us that our model are focusing on terms that are similar to these when deciding whether to classify a headline as a politics headline.

distance_df.sort_values('distance', ascending = True).head(10)

	word	distance
803	minister	-2.222333
3697	labour	-2.092312
1654	parliament	-2.006268
2021	liberal	-1.884801
4103	tory	-1.868131
508	party	-1.847066
2063	conservatives	-1.814579
2584	chancellor	-1.798007
5290	tories	-1.710996
5216	conservative	-1.624608

We can look at words that are close to the decision boundary. Here we find as hypothesised a set of words that are either not associated with either politics or sports, like credentials or barrles, or associated with both, like appeal or affected.

distance_df.loc[distance_df.distance.between(-0.01, 0.01)].head(10)

	word	distance
18	launch	0.001324
117	appeal	-0.008476
145	vital	0.009059
195	bale	-0.002664
303	affected	-0.005008
329	mcnicholl	-0.009333
354	credentials	0.001708
473	barrel	-0.005074
518	anything	0.007034
571	aims	0.007861

Logistic Regression

We can make a similar form of analysis using a logistic regression. However, rather than calculating the distance of all our vocabulary words from the decision boundary, we predict the class probability of each word.

Train Model

# Dependency
from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()

lr.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

y_pred = svm.predict(X_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

    politics       0.91      0.97      0.94       158
       sport       0.98      0.95      0.97       301

    accuracy                           0.96       459
   macro avg       0.95      0.96      0.95       459
weighted avg       0.96      0.96      0.96       459

Get probability score for each vocabulary word

prob_df = pd.DataFrame({'word':vocab, 'probability':lr.predict_proba(vocab_matrix)[:,0]})

prob_df.sort_values('probability', ascending = True).head(10)

	word	probability
2582	sevens	0.000000e+00
3435	tournament	2.220446e-16
213	championship	2.220446e-16
2350	rugby	8.881784e-16
82	a-league	1.998401e-15
4410	finals	1.998401e-15
3602	champions	3.530509e-14
1203	nrl	3.819167e-14
4190	tennis	4.796163e-14
755	squad	7.638334e-14

Again we are seeing near identical results to what we saw with the support vector machines. All terms with a low probability of being a politics headline are sports terms.

Of the words given a high probability for politics article, we see the same politics terms popping up as with the svm.

prob_df.sort_values('probability', ascending = False).head(10)

	word	probability
803	minister	1.0
3697	labour	1.0
1654	parliament	1.0
2021	liberal	1.0
508	party	1.0
2063	conservatives	1.0
4103	tory	1.0
1034	parliamentary	1.0
5290	tories	1.0
435	prime	1.0

Random Forests

As you may have realised by now, this approach should work for any classifier that can output a probability score, be that a simple decision tree, a naive bayes or a deep neural network.

As a final demonstration, I will run the task through a Random Forest Classifier.

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(criterion = 'entropy', max_depth = 15)

rf.fit(X_train, y_train)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='entropy', max_depth=15, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)

y_pred = rf.predict(X_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

    politics       0.91      0.87      0.89       158
       sport       0.93      0.95      0.94       301

    accuracy                           0.92       459
   macro avg       0.92      0.91      0.91       459
weighted avg       0.92      0.92      0.92       459

tree_df = pd.DataFrame({'word':vocab ,'probability':rf.predict_proba(vocab_matrix)[:,0]})

tree_df.sort_values('probability', ascending = False).head(10)

	word	probability
3697	labour	0.93
2892	labor	0.92
1654	parliament	0.90
106	lawmakers	0.90
2028	political	0.90
5216	conservative	0.88
508	party	0.87
2021	liberal	0.87
2063	conservatives	0.86
634	reform	0.86

tree_df.sort_values('probability', ascending = True).head(10)

	word	probability
213	championship	0.07
2372	games	0.07
4049	matches	0.07
4410	finals	0.07
2020	scoring	0.09
3202	beating	0.09
2485	softball	0.12
2247	beat	0.12
3764	thrilling	0.12
3974	prix	0.13

Conclusion

In this notebook I have proposed and demonstrated a simple approach to add interpretability to any classifier taking word embeddings as features. This simple approach allows you to sense check that what your classifier picks up is reasonable, and at least to some extent explain the data generating process.

All though this approach is not directly applicable to regression problems, you can do something fairly similar... but that is for another post.

mhaugestad/interpretable_wordembedding_classifiers