A named entity recognition engine based on UIMA Tokens regex

Usage

Spotting named enitities of type `Person`

Lang lang = Lang.FR;

// The tokenizer AE, borrowed from TermSuite
AnalysisEngineDescription tokenizerAE = TermSuiteAEFactory.createWordTokenizerAEDesc(lang);

// The Person NER AE
AnalysisEngineDescription personAE = TokensNERFactory.createPersonNEREngine(lang);

// The Aggregated AE
AnalysisEngineDescription aed = AnalysisEngineFactory.createEngineDescription(
      tokenizerAE,
      personAE);

// Run on a given sentence
JCas cas = JCasFactory.createJCas();
cas.setDocumentText("Emmanuel Macron est le nouveau président.");
AnalysisEngine engine = UIMAFramework.produceAnalysisEngine(aed);
engine.process(cas);

// Iterate over spotted named entities
Iterator<NamedEntity> it = cas.getAnnotationIndex(NamedEntity.class).iterator();

The AE will create a NamedEntity annotation every time one of the UIMA Tokens Regex person rules matches.

Resources involved are:

UIMA Tokens Regex person rules
List of first names
List of civil titles

nantesnlp/uima-tokens-ner

Usage

Spotting named enitities of type Person

Spotting named enitities of type `Person`