Update demos to Iceberg 0.13.1 / Nessie 0.19.0
Closed this issue · 9 comments
@nastra : Should it be nessie 0.19.0 ? Because 0.18.0 has old nessie spark extensions ?
It needs to be Nessie 0.18.0 I believe (the version that is being used in Iceberg 0.13.0), but worth trying if we can use Nessie 0.19.0
It needs to be Nessie 0.18.0 I believe (the version that is being used in Iceberg 0.13.0), but worth trying if we can use Nessie 0.19.0
but version that is being used by iceberg is not having the latest nessie spark extensions due to our circular dependency. So, I think it cannot be 0.18.0 as demos need nessie spark extensions.
maybe the flink demo could be updated to Flink 1.14.0 along the way due to apache/iceberg@2d4b0dd
it was noticed that the demo links to https://projectnessie.org/tools/spark/ but it should be https://projectnessie.org/tools/iceberg/spark/
so we should also update that
note to self: can use iceberg 0.13.1 with nessie 0.18.0
https://github.com/apache/iceberg/blob/apache-iceberg-0.13.1/versions.props
note to @XN137 : You can try with nessie 0.18.0, but I think nessie spark session extensions were upgraded in 0.19.0 so it will not work.
There is a circular dependency,
Nessie makes a release (0.18.0)-> Iceberg makes a release with that nessie (0.13.1) -> nessie updates its spark extensions with that iceberg version (0.19.0)
If it works on 0.18.0, let me know also. I am also curious to see whether my understanding is right or not.
You can try with nessie 0.18.0, but I think nessie spark session extensions were upgraded in 0.19.0 so it will not work.
If it works on 0.18.0, let me know also. I am also curious to see whether my understanding is right or not.
@ajantha-bhat i think you are right in that we need to use 0.19.0 in order to use iceberg 0.13.1
On 0.18.0 i was getting this error in the spark demo:
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
/tmp/ipykernel_345/2286535781.py in <module>
----> 1 spark.sql("CREATE BRANCH dev IN dev_catalog FROM main").toPandas()
(... noise cut out...)
Py4JJavaError: An error occurred while calling o42.sql.
: java.lang.NoClassDefFoundError: org/projectnessie/client/NessieClient
at org.apache.spark.sql.execution.datasources.v2.NessieUtils$.nessieClient(NessieUtils.scala:183)
at org.apache.spark.sql.execution.datasources.v2.NessieExec.run(NessieExec.scala:29)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:46)
at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: org.projectnessie.client.NessieClient
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 31 more
and upgrading to 0.19.0 fixed it.
does this confirm your understanding?
is the circular dependency explained somewhere?
i can see https://github.com/projectnessie/nessie/blob/nessie-0.18.0/clients/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/NessieUtils.scala#L182-L185 referring to NessieClient
even though
https://github.com/projectnessie/nessie/blob/nessie-0.18.0/clients/client/src/main/java/org/projectnessie/client/api/NessieApiV1.java already exists in that version.
but only in 0.19.0
the NessieApiV1
is being used in the spark extensions:
https://github.com/projectnessie/nessie/blob/nessie-0.19.0/clients/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/NessieUtils.scala#L178-L181
@XN137 : Thanks for validating. So, my understanding is correct.
is the circular dependency explained somewhere?
we can update it. But only SQL extensions doesn't work, rest of the things works (like RESET API, JAVA API, CLI). Circular dependency is only for this module.
I think background is Iceberg don't want to keep this Nessie SQL extensions and its integration test. So, we are keeping it in Nessie with a circular dependency on Iceberg.