GraphNER

This repository contains the code for reproducing the preliminary results reported in the paper "Named Entity Recognition as Graph Classification" (currently under review for the ESWC 2021 Poster Track conference).

Overview

The code is organized as notebooks, to be used as follows:

final_generate_gazetteers.ipynb: to generate gazeteers from Wikidata (by specifying a list of QIDs corresponding to the entity types that one wishes to extract)
edge_list_generation.ipynb: to generate the graph structure to build the graph embeddings; when applied to the ConLL 2003 train dataset, one should get a similar result that this Python dict data structure
graph_embeddings_generation.ipynb: to generate node embeddings using of the algorithms (e.g. node2ve, SDNE..) provided by the GEM library
node2vec_classification.ipynb: to train a model for the node2vec embeddings
transE_classification.ipynb: to train a model for the trans-E embeddings
autoencoder_embeddings.ipynb: to generate auto-encoder embeddings from the binary graph representations
autoencoder_classification.ipynb: to train a model for the auto-encoder embeddings
GCN_classification.ipynb: to train a Graph Convolution Network (based on this architecture)

The code will be streamlined into stand-alone configurable scripts and fully documented soon.

⚠️ This code runs on a CUDA11.0-enabled GPU, please install the compatible version of the modules for your hardware.

The table below shows the best performance of different models on the validation set (dev) of CoNLL-2003

Method	Accuracy	Micro-F1	Macro-F1
Auto-encoder	91.8	91.5	71.7
Node2Vec	93.8	94.1	82.1
Trans-E	94.1	93.6	78.8
GCN	96.5	96.5	88.8

As for test set performance: