/uima-tokens-ner

A named entity recognition engine based on UIMA Tokens regex

Primary LanguageJavaApache License 2.0Apache-2.0

Build Status

A named entity recognition engine based on UIMA Tokens regex

Usage

Spotting named enitities of type Person

Lang lang = Lang.FR;

// The tokenizer AE, borrowed from TermSuite
AnalysisEngineDescription tokenizerAE = TermSuiteAEFactory.createWordTokenizerAEDesc(lang);

// The Person NER AE
AnalysisEngineDescription personAE = TokensNERFactory.createPersonNEREngine(lang);

// The Aggregated AE
AnalysisEngineDescription aed = AnalysisEngineFactory.createEngineDescription(
      tokenizerAE,
      personAE);

// Run on a given sentence
JCas cas = JCasFactory.createJCas();
cas.setDocumentText("Emmanuel Macron est le nouveau président.");
AnalysisEngine engine = UIMAFramework.produceAnalysisEngine(aed);
engine.process(cas);

// Iterate over spotted named entities
Iterator<NamedEntity> it = cas.getAnnotationIndex(NamedEntity.class).iterator();

The AE will create a NamedEntity annotation every time one of the UIMA Tokens Regex person rules matches.

Resources involved are: