QueryBatcher never stops when call to get URIs fails
ralfhergert opened this issue · 5 comments
We are trying to execute a long running query using withConsistentSnapshot=true. Depending on the configuration of our ML-DB the QueryBatcher may recieve a server error, as soon as the ML-DB is no longer capable of providing the snapshot. That is not an issue. But what is a problem for us is, that the exception thrown in
- the QueryBatcher does not stop itself and still considers itself as working/running
- no FailureListener attached to the QueryBatcher is called
- the exception is just logged
This is how the log messages looks like, when all worker threads are dying due to a server-side error:
Exception in thread "pool-11-thread-1" com.marklogic.client.FailedRequestException: Local message: failed to apply resource at internal/uris: Internal Server Error. Server Message: Server (not a REST instance?) did not respond with an expected REST Error message.
at com.marklogic.client.impl.OkHttpServices.checkStatus(OkHttpServices.java:4449)
at com.marklogic.client.impl.OkHttpServices.postResource(OkHttpServices.java:3382)
at com.marklogic.client.impl.OkHttpServices.postResource(OkHttpServices.java:3438)
at com.marklogic.client.impl.QueryManagerImpl.uris(QueryManagerImpl.java:169)
at com.marklogic.client.impl.OkHttpServices.uris(OkHttpServices.java:3030)
at com.marklogic.client.datamovement.impl.QueryBatcherImpl$QueryTask.run(QueryBatcherImpl.java:738)
at com.marklogic.client.impl.OkHttpServices.postResource(OkHttpServices.java:3373)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at com.marklogic.client.impl.OkHttpServices.processQuery(OkHttpServices.java:3130)
at java.base/java.lang.Thread.run(Unknown Source)
at com.marklogic.client.impl.OkHttpServices.postResource(OkHttpServices.java:3373)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at com.marklogic.client.impl.OkHttpServices.processQuery(OkHttpServices.java:3130)
at com.marklogic.client.datamovement.impl.QueryBatcherImpl$QueryTask.run(QueryBatcherImpl.java:738)
at java.base/java.lang.Thread.run(Unknown Source)
at com.marklogic.client.impl.OkHttpServices.checkStatus(OkHttpServices.java:4449)
at com.marklogic.client.impl.OkHttpServices.postResource(OkHttpServices.java:3382)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
Exception in thread "pool-11-thread-2" com.marklogic.client.FailedRequestException: Local message: failed to apply resource at internal/uris: Internal Server Error. Server Message: Server (not a REST instance?) did not respond with an expected REST Error message.
at com.marklogic.client.impl.OkHttpServices.uris(OkHttpServices.java:3030)
at com.marklogic.client.impl.OkHttpServices.postResource(OkHttpServices.java:3438)
at com.marklogic.client.impl.QueryManagerImpl.uris(QueryManagerImpl.java:169)
Exception in thread "pool-11-thread-3" com.marklogic.client.FailedRequestException: Local message: failed to apply resource at internal/uris: Internal Server Error. Server Message: Server (not a REST instance?) did not respond with an expected REST Error message.
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at com.marklogic.client.impl.OkHttpServices.postResource(OkHttpServices.java:3373)
at com.marklogic.client.datamovement.impl.QueryBatcherImpl$QueryTask.run(QueryBatcherImpl.java:738)
at com.marklogic.client.impl.OkHttpServices.checkStatus(OkHttpServices.java:4449)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at com.marklogic.client.impl.OkHttpServices.uris(OkHttpServices.java:3030)
at java.base/java.lang.Thread.run(Unknown Source)
at com.marklogic.client.impl.OkHttpServices.postResource(OkHttpServices.java:3438)
at com.marklogic.client.impl.QueryManagerImpl.uris(QueryManagerImpl.java:169)
at com.marklogic.client.impl.OkHttpServices.processQuery(OkHttpServices.java:3130)
at com.marklogic.client.impl.OkHttpServices.postResource(OkHttpServices.java:3382)
We would expect:
- that instead of logging the exception(s), the QueryBatcher calls the registed FailureListeners
- that the QueryBatcher no longer considers itself to be "running" (since all it's worker threads are now dead)
- the exception should not be logged when the error is escalated
@ralfhergert, the expectations seem quite sensible and presumably will bear up under investigation.
Have you noticed whether the error log or the request log for the appserver on the enode host has a related server-side error?
@georgeajit, for what it's worth, my wild guess is that the server is not sending the error in JSON format. If so, the client would be unable to parse it. That is, the fix would have two parts. On the server, the appserver should respect the error format header for all errors. On the client, the catch should provide special handling for the unsupportable timestamp per the expectations given in the issue report.
I checked for server-side errors, but could not find any 400/500 errors. The AppServer for our database we are trying to retrieve the documents from is listening to port 8040. The AppServer is configured to use "/MarkLogic/rest-api/error-handler.xqy" as "error handler". The log level is currently "info".
In fact in 8040_AccessLog.txt I did find the POST-requests for the next pages and all subsequent GET-requests for the single documents. But then the requests suddenly stop. There is no request being answered with a 400 or 500 category response.
Also the 8040_ErrorLog.txt, 8040_RequestLog.txt and general ErrorLog.txt show no correlating error message.
BTW we are using 9.0-13.4 as server and the java-client in version 5.5.0.
Thanks, @ralfhergert , for following up.
@ralfhergert Apologies for the delay in responding. Your diagnosis is correct, and this is the same issue as #1287 - if the call to get URIs fails for any reason, the exception is logged but the job is not stopped and thus it hangs indefinitely. We are tracking this internally but will leave this ticket open as a reminder for us to respond to you when a fix is included in a release.
Resolved in 6.1.0