NeuralVecmap (DNN-based embedding mapping)

This is an open source implementation of the nonlinear mapping between embedding sets used in this paper:

D Newman-Griffis and A Zirikly, "Embedding Transfer for Low-Resource Medical Named Entity Recognition: A Case Study on Patient Mobility". In Proceedings of BioNLP 2018, 2018.

The included demo.sh script will download two small sets of embeddings, learn a demonstration mapping between them, and calculate changes in nearest neighbors.

Dependencies

External

Tensorflow (we used version 1.3.1)
NumPy

Internal (frozen copies of all included in the lib directory)

Method description

This implementation learns a nonlinear mapping function from a source set of embeddings to a target set, based on shared keys (pivots). The embeddings do not have to be of the same dimensionality, but must have keys in common.

The process follows three steps:

(1) Identification of pivots

Pivot terms used in the mapping process may be selected from the set of keys present in both the source and target embeddings in one of two ways:

Frequent keys: the top N keys by frequency in the target corpus are used as pivots.
Random/all keys: a random subset of N shared keys (or all shared keys, if N is unspecified) is used as pivots.

(2) Learning k-fold projections

Pivot terms are divided into k folds. For each fold, a nonlinear projection is learned as follows:

Construct a feed-forward DNN, taking source embeddings as input and generating output of the same size as target embeddings. Model parameters include:
- Number of layers
- Activation function (tanh or ReLU)
- Dimensionality of hidden layers (by default, same as target embedding size)
Use minibatch gradient descent to train over each shared key in the training set
- Loss function is batch-wise MSE between output embeddings and reference target embeddings
- Optimization with Adam
After each epoch (all shared keys in training set), evaluate MSE on held-out set
When held-out MSE stops decreasing, stop training and revert to previous best model parameters

(3) Generating final transformation

Getting the final projection of source embeddings into target embedding space is a two-step process:

Take the projection function learned for each trained fold and project all source embeddings
Average all k projections to yield final projection of source embeddings

Nearest neighbor analysis

This repository also includes the code used to calculate changes in nearest neighbors after the learned mapping is applied, in nn-analysis.

nearest_neighbors.py Tensorflow implementation of nearest neighbor calculation by cosine distance
nn_changes.py script to calculate how often nearest neighbors change after the mapping is learned

Reference

If you use this software in your own work, please cite the following paper:

@inproceedings{Newman-Griffis2018BioNLP,
  author = {Newman-Griffis, Denis and Zirikly, Ayah},
  title = {Embedding Transfer for Low-Resource Medical Named Entity Recognition: A Case Study on Patient Mobility},
  booktitle = {Proceedings of BioNLP 2018},
  year = {2018}
}

drgriffis/NeuralVecmap