apache/uima-uimaj

NullPointerException while creating engine instance and execution

azazali30 opened this issue · 10 comments

Describe the bug
we are running analysis using JcasPool which at a time can have 60 Jcas objects available. After upgrade to UIMA 3.4.1 we started seeing this NullPointerException in ResultSpecification_impl.intersect
FYI:

error: org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.    
	at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:415)
	at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:299)
	at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:590)
	at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:422)
	at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:352)
	at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:276)
	at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:295)
	at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:312)
	at com.pega.nlp.textanalytics.engines.pool.AnalysisEnginePoolHolder.analyze(AnalysisEnginePoolHolder.java:214)
	at com.pega.nlp.textanalytics.accessor.TextAnalyticsAccessor.runTextAnalytics(TextAnalyticsAccessor.java:117)
	at com.pega.nlp.textanalytics.accessor.TextAnalyticsAccessor.runTextAnalytics(TextAnalyticsAccessor.java:62)
	at com.pega.nlp.textanalytics.accessor.TextAnalyticsAccessorTest.lambda$testConcurrency$0(TextAnalyticsAccessorTest.java:225)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NullPointerException
	at org.apache.uima.analysis_engine.impl.ResultSpecification_impl.intersect(ResultSpecification_impl.java:743)
	at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
	... 15 more
error: org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.    
	at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:415)
	at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:299)
	at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:590)
	at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:422)
	at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:352)
	at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:276)
	at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:295)
	at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:312)
	at com.pega.nlp.textanalytics.engines.pool.AnalysisEnginePoolHolder.analyze(AnalysisEnginePoolHolder.java:214)
	at com.pega.nlp.textanalytics.accessor.TextAnalyticsAccessor.runTextAnalytics(TextAnalyticsAccessor.java:117)
	at com.pega.nlp.textanalytics.accessor.TextAnalyticsAccessor.runTextAnalytics(TextAnalyticsAccessor.java:62)
	at com.pega.nlp.textanalytics.accessor.TextAnalyticsAccessorTest.lambda$testConcurrency$0(TextAnalyticsAccessorTest.java:225)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NullPointerException
	at org.apache.uima.analysis_engine.impl.ResultSpecification_impl.intersect(ResultSpecification_impl.java:743)
	at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
	... 15 more

Please complete the following information:

  • Version: Uima 3.4.1, Uima-Ruta 3.3.0
  • OS: Linux, OS X

Additional context
Add any other context about the problem here.

Are you actually using result specifications in your setup?

No we are not using it not sure how its being used internally

Are you declaring any output capabilities in your XML descriptors or using uimaFIT annotations?

Are you calling the process method or a similar method of a UIMA component from multiple concurrent threads?

below is a jist of main code i have extracted from my code base

org.apache.uima.util.JCasPool jCasPool = new JCasPool(poolSize, aae)

List extractors = //list of Annotators
final AnalysisEngineDescription aaeDesc = org.apache.uima.fit.factory.AnalysisEngineFactory
.createEngineDescription(extractors.toArray(new AnalysisEngineDescription[extractors.size()]));
AnalysisEngine engine = org.apache.uima.fit.factory.AnalysisEngineFactory.createEngine(aaeDesc);

for(String text : textsArray) {
try {
//update jcas with text
JCas jcas = jCasPool.getJCas()
engine.process(jCas)
} finally {
jCasPool.releaseJCas(jCas)
}

}``

Ok, but is this code called from multiple threads? Note that UIMA components are not expected to be thread-safe. When UIMA parallelizes, it creates multiple instances of a component - one for each of the parallel threads. A component may declare that it is not parallelizable (e.g. writers or components with static fields), then UIMA would not parallelize the component at all and only use a single single-threaded instance of this component.

Are you trying to share a component across multiple concurrent threads?

Ok, but is this code called from multiple threads? Note that UIMA components are not expected to be thread-safe. When UIMA parallelizes, it creates multiple instances of a component - one for each of the parallel threads. A component may declare that it is not parallelizable (e.g. writers or components with static fields), then UIMA would not parallelize the component at all and only use a single single-threaded instance of this component.

Are you trying to share a component across multiple concurrent threads?

we are caching the AnalysisEngine engine = org.apache.uima.fit.factory.AnalysisEngineFactory.createEngine(aaeDesc);
so every thread will be using same instance of AnalysisEngine . Is that fine

If every thread is using the same instance of the analysis engine, then you are sharing that instance across threads. This is not supported. Every thread must have its own instance.

@reckart i wonder why JcasPool doc says we can use this pool when there is a need of multiple CASes to be processed simultaneously.
And if you see JcasPool has a constructor which accepts Analysis Engine as parameter , this means it will create these jcas instances using same AE. Can you help me understand why this is not contradicting with your statement thanks.

The creation of a new CAS can be an expensive process. Thus, instead of creating a new CAS object for every document, it can be sensible to maintain a pool of CAS objects which are reused while processing a batch of documents.

The CAS pool needs to know information like the type system, index definitions, etc. which can be obtained from an analysis engine - it does not need the engine itself. The constructor that takes an engine is a convenience constructor. The relevant one is org.apache.uima.util.JCasPool.JCasPool(int, ProcessingResourceMetaData) which only considers the configuration, not the actual engine.