opencypher/cypher-for-gremlin

Cypher query not terminating and gremlin server becomes unresponsive

SarthakGhosh16 opened this issue · 11 comments

We are using cypher-for-gremlin in our project where we are seeing this issue that if it is a very generic query and if the server is taking too long to return the data, the query doesn't timeout and the host becomes unavailable.

Error: java.util.concurrent.TimeoutException: Timed out while waiting for an available host - check the client configuration and connectivity to the server if this message persists

We did a manual installation of opencypher server plugin. Below is my gremlin-server.yaml file

host: 0
port: 8182
scriptEvaluationTimeout: 120000
serializedResponseTimeout: 120000
threadPoolWorker: 16
gremlinPool: 8
channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
graphs: {
  fci: conf/janusgraph-hbase.properties,insights: conf/janusgraph-insights-hbase.properties
}
scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}},
    scripts: [scripts/empty-sample.groovy],
    staticImports: ['org.opencypher.gremlin.process.traversal.CustomPredicates.*']}}
serializers:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry,org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV3d0] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  # Older serialization versions for backwards compatibility:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
  - { className: org.opencypher.gremlin.server.op.cypher.CypherOpProcessor, config: { sessionTimeout: 28800000}}
metrics: {
  consoleReporter: {enabled: true, interval: 180000},
  csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: false},
  slf4jReporter: {enabled: false, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
threadPoolBoss: 1
maxInitialLineLength: 4096 
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 81928192
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
enabled: false}

As you can see, we have mentioned scriptEvaluationTimeout but cypher queries isn't following that. Normal gremlin queries terminate after the mentioned time.
Below is the gremlin version we are using.

gremlin> Gremlin.version()
==>3.3.3

Hello @SarthakGhosh16, this sounds like bug, but currently I wasn't able to reproduce it with provided config. Will continue investigation on Monday.

@dwitry Sure. Let me know if you need any other info. Thanks!

@dwitry did you find anything?

Hello @SarthakGhosh16,

yes in fact this is a bug. Cypher for Gremlin plugin is ignoring scriptEvaluationTimeout.

Will look into fixing it.

@dwitry Ok. Thanks!
Is there any workaround for now which we could try and any estimate till when this can be fixed?
opencypher is actually an important feature in our product.

I've committed a fix, that should make Cypher Plugin to handle scriptEvaluationTimeout

Next step will be to add scriptEvaluationTimeout configuration for Cypher Client. After that and fix for #314 we plan to do release.

Unfortunately there is no known workaround. Until fix is released, you can try to use snapshot version.

I'll wait for an official release then. Let me know when you've released it.
Thanks for the help.

@SarthakGhosh16
I come from #312 , maybe you can try to increase the writeBufferHighWaterMark in gremlin-server.yaml and increase the Xmx of gremlin server.

Fixed in #316 and #318

Hello @SarthakGhosh16 ,

new version released which contains fix for this issue.

Please test if everything works as expected.

Thanks @dwitry. I'll try it.
Q: Is cypher-gremlin-server-client v1.0.3 compatible with cypher-gremlin-server-plugin-0.9.13-all.jar ?
My gremlin version is 3.3.3