“No data nodes with HTTP-enabled available” error when writing from Spark to elasticsearch on Google Dataproc

Question

“No data nodes with HTTP-enabled available” error when writing from Spark to elasticsearch on Google Dataproc

Closed this issue 7 years ago · 2 comments

I'm not sure if this is a ES cluster config issue or a firewall/Google Dataproc issue so I've also posted it to

https://discuss.elastic.co/t/getting-a-no-data-nodes-with-http-enabled-available-error-when-writing-from-spark-to-elasticsearch-on-google-dataproc/96631

I'm getting

	at java.lang.Thread.run(Thread.java:748)org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
	at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
	at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:576)
	at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
	at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:91)
	at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:91)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
	at org.apache.spark.scheduler.Task.run(Task.scala:86)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)

Error summary: EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available

when trying to write from Spark to the elasticsearch cluster.

If I set elasticsearch config option es.nodes.client.only=true it starts working.
I'm wondering if there's a way to avoid having to set es.nodes.client.only=true and write directly to the data nodes.

Answer 1 · 2017-08-10T19:42:32.000Z

It seems Spark uses the HTTP API for accessing directly the data nodes. This is not recommended but you can enable it by setting true the following environment variable, https://github.com/pires/kubernetes-elasticsearch-cluster/blob/master/es-data.yaml#L48-L49

Answer 2 · 2017-08-11T13:39:05.000Z

that fixed it, thanks!