A named entity recognition engine based on UIMA Tokens regex
Lang lang = Lang.FR;
// The tokenizer AE, borrowed from TermSuite
AnalysisEngineDescription tokenizerAE = TermSuiteAEFactory.createWordTokenizerAEDesc(lang);
// The Person NER AE
AnalysisEngineDescription personAE = TokensNERFactory.createPersonNEREngine(lang);
// The Aggregated AE
AnalysisEngineDescription aed = AnalysisEngineFactory.createEngineDescription(
tokenizerAE,
personAE);
// Run on a given sentence
JCas cas = JCasFactory.createJCas();
cas.setDocumentText("Emmanuel Macron est le nouveau président.");
AnalysisEngine engine = UIMAFramework.produceAnalysisEngine(aed);
engine.process(cas);
// Iterate over spotted named entities
Iterator<NamedEntity> it = cas.getAnnotationIndex(NamedEntity.class).iterator();
The AE will create a NamedEntity
annotation every time one of the UIMA Tokens Regex person rules matches.
Resources involved are: