databricks/spark-corenlp

java.lang.NoSuchMethodError with scala/spark 2.10

Closed this issue · 10 comments

I'm getting the same error that raghugvt posted here. He solved the problem by bundeling everything together in one jar, however thats not an option as I would like to use spark-corenlp in a notebook.

My build.sbt is as follows:


version := "1.0"

scalaVersion := "2.10.6"

resolvers += "Spark Packages Repository" at "https://dl.bintray.com/spark-packages/maven/"

libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-core_2.10" % "2.1.0",
  "org.apache.spark" % "spark-sql_2.10" % "2.1.0",
  "com.databricks" % "spark-csv_2.10" % "1.5.0",
  "org.apache.spark" % "spark-mllib_2.10" % "2.1.0"
)

libraryDependencies += "edu.stanford.nlp" % "stanford-corenlp" % "3.7.0" withSources() withJavadoc()
libraryDependencies += "edu.stanford.nlp" % "stanford-corenlp" % "3.7.0" classifier "models"
libraryDependencies += "databricks" % "spark-corenlp" % "0.2.0-s_2.11"

I'm testing with this script:

import org.apache.spark.sql.functions._
import com.databricks.spark.corenlp.functions._
import org.apache.spark.sql.SparkSession

val spark = SparkSession
  .builder().master("local")
  .appName("Spark SQL basic example")
  .config("master", "spark://myhost:7077")
  .getOrCreate()

val sqlContext = spark.sqlContext

import sqlContext.implicits._

val input = Seq(
  (1, "<xml>Stanford University is located in California. It is a great university.</xml>")
).toDF("id", "text")

val output = input
  .select(cleanxml('text).as('doc))
  .select(explode(ssplit('doc)).as('sen))
  .select('sen, tokenize('sen).as('words), ner('sen).as('nerTags), sentiment('sen).as('sentiment))

output.show(truncate = false)

Which results in the error:

java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
	at com.databricks.spark.corenlp.functions$.cleanxml(functions.scala:54)

What is going wrong here?

I am facing the exact same issue. Can someone please help out here?

@warrenronsiek were you able to resolve this issue?

@saurabh14rajput I ended up not using this library. Instead I created workaround with a UDF that wrapped the features of Stanford nlp that I wanted to use. Probably not super efficient or best practice - but it turns out to be relatively fast.

Okay. Thanks!

Hey, why do you include the corenlp in you dependencies? It is already done by spark-corenlp, see https://github.com/databricks/spark-corenlp/blob/master/build.sbt#L39. Can you try to use version corenlp 3.6.0 and report back if you still have issues?

Hey, I think the issue is that you are mixing spark-corenlp 2.11 with Spark 2.10. Your should replace the line

libraryDependencies += "databricks" % "spark-corenlp" % "0.2.0-s_2.11"

with

libraryDependencies += "databricks" % "spark-corenlp" % "0.2.0-s_2.10"

I'm getting the same error. I cloned this repo, ran sbt package to build the jar, then invoked spark shell like this:

/opt/spark/spark-2.0.1/bin/spark-shell --jars ~/spark-corenlp_2.10-0.3.0-SNAPSHOT.jar

I get the error even if I specify library dependencies, like this:

/opt/spark/spark-2.0.1/bin/spark-shell --jars ~/spark-corenlp_2.10-0.3.0-SNAPSHOT.jar --packages databricks:spark-corenlp:0.2.0-s_2.10,edu.stanford.nlp:stanford-corenlp:3.7.0

@iandow try with spark-corenlp_2.11. Spark 2.x is using Scala 2.11.

No dice. Same error.

@zouzias using

libraryDependencies += "databricks" % "spark-corenlp" % "0.2.0-s_2.10"

solved the problem for the example I posted above. I cant speak for the other people who are getting the same error.