Failed to construct kafka producer
misterpilou opened this issue · 5 comments
Kafka feature, with all default args doesn't work, i have kafka 1.1.0, running on localhost:9092.
I give you the error trace:
2018-04-27 01:35:40 ERROR Executor:95 [Executor task launch worker-0] - Exception in task 0.0 in stage 1.0 (TID 1)
org.apache.kafka.common.KafkaException: Failed to construct kafka producer
at org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:335)
at org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:188)
at edu.usc.irds.sparkler.pipeline.SparklerProducer.(SparklerProducer.scala:45)
at edu.usc.irds.sparkler.pipeline.Crawler$$anonfun$storeContentKafka$1.apply(Crawler.scala:235)
at edu.usc.irds.sparkler.pipeline.Crawler$$anonfun$storeContentKafka$1.apply(Crawler.scala:234)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.config.ConfigException: Invalid url in bootstrap.servers: -ktp
at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:45)
at org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:275)
... 14 more
Caused by: org.apache.kafka.common.config.ConfigException: Invalid url in bootstrap.servers: -ktp
This indicates an error in CLI args. Please paste the full command line that you used to invoke sparkler.
Also, if you edited sparkler config( the .yaml file) , please let me know
build/bin/sparkler.sh crawl -id 26-04-2018-00-16 -ke -kls -ktp, doing a little workaround, it happens when -ktp is after -kls
I did edited the sparkler-default.yaml but turned back to master after stash
Hmm,
here is how things work:
- -ke is a boolean flag. when it is missing, kafka is disabled. when it is added to CLI, it turns the flag that enables kafka. You need to include
-ke
as you have done to enable it - -kls is short for kafka listeners. It accepts host:port. Default is
localhost:9020
. If you want to keep it the same, then you dont need to do anything. If you want to change it to host1:6060, then-kls host1:6060
should be added to command - -ktp is short for kafka topic. It is the topic name. Default is
sparkler_<jobid>
, since26-04-2018-00-16
is your job id, your default topic will besparkler_26-04-2018-00-16
. If you want to change it, then add-ktp topicname
. If you want your job id to be included, then add%s
in string to get it replaced.
Here is what went wrong in your command line args:
since you tried to customize kafka address with -kls -ktp
, as you see -ktp
is read as value to -kls
, and it doesnt match host:port
pattern (example for correct value localhost:9020
)
So,
Simple things first.
- Just add
-ke
and leave the rest to defaults which are defined in config file. - Listen to
sparkler_26-04-2018-00-16
to recieve the data.
Ah Okay, thanks for the information, so not an issue at all