“No data nodes with HTTP-enabled available” error when writing from Spark to elasticsearch on Google Dataproc
Closed this issue · 2 comments
bw2 commented
I'm not sure if this is a ES cluster config issue or a firewall/Google Dataproc issue so I've also posted it to
I'm getting
at java.lang.Thread.run(Thread.java:748)org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:576)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:91)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:91)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Error summary: EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
when trying to write from Spark to the elasticsearch cluster.
If I set elasticsearch config option es.nodes.client.only=true
it starts working.
I'm wondering if there's a way to avoid having to set es.nodes.client.only=true
and write directly to the data nodes.
pires commented
It seems Spark uses the HTTP API for accessing directly the data nodes. This is not recommended but you can enable it by setting true
the following environment variable, https://github.com/pires/kubernetes-elasticsearch-cluster/blob/master/es-data.yaml#L48-L49
bw2 commented
that fixed it, thanks!