linkedin/isolation-forest

InvalidClassExcepiton

ikoiko opened this issue · 12 comments

Hello,
I used your build configuration and successfully built jar file(isolation-forest_2.11-0.3.1) by using gradlew. However when i use newly generated jar on my project it gives me an error while fitting data.
Error Detail :
"Caused by: java.io.InvalidClassException: com.linkedin.relevance.isolationforest.IsolationForest; local class incompatible: stream classdesc serialVersionUID = 5883725353499012901, local class serialVersionUID = 6413710209040362293"

I built source code on my

  • Virtual box Ubuntu Linux
  • Scala 2.11.11
  • Spark 2.4.4

My gradle.build file is following :
plugins {
// Apply the scala plugin to add support for Scala
id 'scala'
}

dependencies {
compile("com.chuusai:shapeless_2.11:2.3.2")
// compile("com.databricks:spark-avro_2.11:4.0.0")
compile("org.apache.spark:spark-avro_2.11:2.4.0")
compile("org.apache.spark:spark-core_2.11:2.4.0")
compile("org.apache.spark:spark-mllib_2.11:2.4.0")
compile("org.apache.spark:spark-sql_2.11:2.4.0")
compile("org.scalatest:scalatest_2.11:2.2.6")
compile("org.testng:testng:6.8.8")
}

test {
useTestNG()
}

archivesBaseName = "${project.name}_2.11"

Can you please help me about solving this issue.

P.S i followed exactly same steps to build release v0.2.2(isolation-forest_2.11-0.2.3) it works perfectly. Only v0.3.0(isolation-forest_2.11-0.3.1) has above problem

Thanks in advance

Thanks, @ikoiko! I'll take a look.

It looks like Spark 2.4.4 may not support Scala 2.11.

"For the Scala API, Spark 2.4.4 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x)."

https://spark.apache.org/docs/latest/

Are you able to use an earlier version of Spark? The library has been most extensively tested with Scala 2.11.8 and Spark 2.3.0.

Alternatively, you should try bumping all of the dependency versions to 2.4.4 for compatibility with the Spark version you're using on your cluster.

plugins {
    // Apply the scala plugin to add support for Scala
    id 'scala'
}

dependencies {
    compile("com.chuusai:shapeless_2.11:2.3.2")
// compile("com.databricks:spark-avro_2.11:4.0.0")
    compile("org.apache.spark:spark-avro_2.11:2.4.4")
    compile("org.apache.spark:spark-core_2.11:2.4.4")
    compile("org.apache.spark:spark-mllib_2.11:2.4.4")
    compile("org.apache.spark:spark-sql_2.11:2.4.4")
    compile("org.scalatest:scalatest_2.11:2.2.6")
    compile("org.testng:testng:6.8.8")
}

test {
    useTestNG()
}

archivesBaseName = "${project.name}_2.11"

Hi @jverbus
Thanks for reply. As soon as i remember i already did dependency bumping as you already mentioned above; however it didn't affect the situation. But i am not 100% sure about it. I will definitely give it a try at tomorrow. On the other hand, i don't have any chance to change my current spark/scala version which are running on production. (We are still using 2.4.0 and scala 2.11.11). Because of our production servers are isolated from internet, i am using virtual Linux as a test and build environment. I will also try changing my linux spark version to 2.4.0 to build libraries clearly.

thanks

@ikoiko : Cool, please let me know if works.

@ikoiko: Any success?

Hi @jverbus
Sorry for late reply. We have just go into an heavy working period so i couldn't reply you back. I configured my virtual environment and use spark 2.4.0 , scala 2.11.11(same as my produciton env) and set build.gradle file to use spark 2.4.0 dependencies but i have failed with same error. I am planning using spark 2.4.4 and scala 2.12.x version to build again. I will let you know whether it works or not.

Here is my version matrix and results for now :

Spark | Scala | Build.Gradle Dependencies | RESULT
2.4.4 | 2.11.11 | 2.4.4 | FAIL
2.4.4 | 2.11.11 | 2.4.0 | FAIL
2.4.0 | 2.11.11 | 2.4.0 | FAIL
2.4.4 | 2.12.x | 2.4.4 | FAIL

Hi @jverbus

I have tried with scala 2.12.0 but no luck.

Hi @ikoiko ,

I spun up an Azure Spark cluster, but wasn't able to reproduce your reported issue.

I tried with Spark 2.4.0 and Scala 2.11.12 on Ubuntu 16.04. I set the build.gradle dependencies to 2.4.0.

I was able to build the jar and use it on the cluster to fit an isolation forest to both the shuttle.csv and mammography.csv datasets that are included in the git repo.

Are you able to try on a different cluster?

Hi @jverbus

Unfortunately can't. We don't have any other cluster.

I'm going to close this as I'm not able to reproduce the issue.