AbstractMethodError with Spark 1.6.0 and Kafka 0.10.2

Question

AbstractMethodError with Spark 1.6.0 and Kafka 0.10.2

MLNW opened this issue 7 years ago · 9 comments

I'm trying to use this library with older versions of Spark (1.6.0-cdh5.11.1) and Kafka (0.10.2-kafka-2.2.0), but while trying to persist the offsets after the application logic happened I get the mentioned error.

It seems to me that it is a version miss match between Scala versions. For me its not easy to switch to 2.11 scala so I guess my question would be: Is there a way to make your library work with my versions?

Below is the observed exception and the important bits of my pom file:

java.lang.AbstractMethodError: consumer.kafka.PartitionOffsetPair.call(Ljava/lang/Object;)Ljava/lang/Iterable;
	at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$fn$4$1.apply(JavaDStreamLike.scala:205)
	at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$fn$4$1.apply(JavaDStreamLike.scala:205)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
17/11/17 12:02:52 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.AbstractMethodError: consumer.kafka.PartitionOffsetPair.call(Ljava/lang/Object;)Ljava/lang/Iterable;
	at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$fn$4$1.apply(JavaDStreamLike.scala:205)
	at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$fn$4$1.apply(JavaDStreamLike.scala:205)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
17/11/17 12:02:52 INFO storage.DiskBlockManager: Shutdown hook called

               <dependency>
			<groupId>org.scala-lang</groupId>
			<artifactId>scala-library</artifactId>
		</dependency>
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-core_2.10</artifactId>
			<exclusions>
				<exclusion>
					<groupId>org.scala-lang</groupId>
					<artifactId>scala-library</artifactId>
				</exclusion>
			</exclusions>
		</dependency>
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-streaming_2.10</artifactId>
			<exclusion>
				<artifactId>org.apache.kafka</artifactId>
				<groupId>kafka_2.10</groupId>
			</exclusion>
			<exclusion>
				<groupId>org.scala-lang</groupId>
				<artifactId>scala-library</artifactId>
			</exclusion>
		</dependency>
		<dependency>
			<groupId>org.apache.hbase</groupId>
			<artifactId>hbase-spark</artifactId>
		</dependency>
		<dependency>
			<groupId>org.apache.kafka</groupId>
			<artifactId>kafka_2.10</artifactId>
			<exclusions>
				<exclusion>
					<groupId>org.apache.zookeeper</groupId>
					<artifactId>zookeeper</artifactId>
				</exclusion>
				<exclusion>
					<groupId>log4j</groupId>
					<artifactId>log4j</artifactId>
				</exclusion>
			</exclusions>
		</dependency>
		<dependency>
			<groupId>dibbhatt</groupId>
			<artifactId>kafka-spark-consumer</artifactId>
			<version>1.0.12</version>
		</dependency>

Answer 1 · 2017-11-17T20:34:13.000Z

Which version of the consumer you are running ? It is due to version of your Spark (1.6.0) and version in pom doesn't match. You can git clone the code and update the consumer pom to match your version and try. But using spark 1.6 you may see couple of compilation issue which are easy to solve.

Here are the steps you can try.

git clone the latest code.
modify pom.xml to match your kafka and spark version ( including scala version)

e.g.

**
<spark.version>1.6.0</spark.version>
<kafka.version>0.10.2.0</kafka.version>

  <groupId>org.apache.spark</groupId>
  <artifactId>spark-core_2.10</artifactId>**

and

  **<groupId>org.apache.spark</groupId>
  <artifactId>spark-streaming_2.10</artifactId>**

and

  **<groupId>org.apache.kafka</groupId>
  <artifactId>kafka_2.10</artifactId>**

As Spark 1.6 and 2.0 has some incompatible changes, you need to remove one Listener call back from
consumer.kafka.ReceiverStreamListener.java

remove this import

import org.apache.spark.streaming.scheduler.StreamingListenerStreamingStarted;

and remove this call back

**@Override
public void onStreamingStarted(StreamingListenerStreamingStarted arg0) {
}**

Spark 1.6 and 2.0 has another incompatibility for return type of PairFlatMapFunction. So you need to modify this file consumer.kafka.PartitionOffsetPair.java

change return type of call method

public Iterator<Tuple2<Integer, Long>> call(Iterator<MessageAndMetadata> it)

to

public Iterable<Tuple2<Integer, Long>> call(Iterator<MessageAndMetadata> it)

And change the return type from

    return kafkaPartitionToOffsetList.iterator();

to

    **return kafkaPartitionToOffsetList;**

That's it. Build the consumer and you should be all set to use it for Spark 1.6 and Kafka 0.10.2

Let me know if you face any issues.

Dibyendu

Answer 2 · 2017-11-18T16:17:40.000Z

Or another option is use consumer version 1.0.9. That will work with Spark 1.6

	<dependency>
		<groupId>dibbhatt</groupId>
		<artifactId>kafka-spark-consumer</artifactId>
		**<version>1.0.9</version>**
	</dependency>

Answer 3 · 2017-11-18T16:30:19.000Z

Here is the V 1.0.9 READ ME
https://github.com/dibbhatt/kafka-spark-consumer/tree/117f98ccf02ad4f6e5a8b8918b5db097e7d3a3d4

Answer 4 · 2017-11-20T16:31:01.000Z

Thank you for your quick response!

I used your first approach and modified the latest code to use my versions of Kafka, Spark and Scala. Seems to work.

I will do some more extensive testing during this week. If I find anything else I'll let you know.

Cheers!

Answer 5 · 2017-11-21T01:31:46.000Z

Perfect. Do let me know if you see any issues or need any help on tuning various knobs .

Answer 6 · 2018-11-29T08:58:11.000Z

When spark job was submitted The system loaded the default jar of CDH(spark-assembly-1.6.0-cdh5.14.4-hadoop2.6.0-cdh5.14.4.jar)。The Kafka version is not 010。(0.9.0)

Answer 7 · 2018-11-29T23:53:27.000Z

Hi @LinMingQiang , in your Application pom, what version of jars you have specified ?

Answer 8 · 2018-11-30T01:13:29.000Z

spark 1.6.0 kafka 0.10.0

Answer 9 · 2018-11-30T17:16:12.000Z

Whats the issue you see ? Is the streaming job not running ?