Update demos to Iceberg 0.13.1 / Nessie 0.19.0

Question

Update demos to Iceberg 0.13.1 / Nessie 0.19.0

Closed this issue 2 years ago · 9 comments

Answer 1 · 2022-02-01T09:27:58.000Z

@nastra : Should it be nessie 0.19.0 ? Because 0.18.0 has old nessie spark extensions ?

Answer 2 · 2022-02-01T10:11:37.000Z

It needs to be Nessie 0.18.0 I believe (the version that is being used in Iceberg 0.13.0), but worth trying if we can use Nessie 0.19.0

Answer 3 · 2022-02-01T10:43:25.000Z

It needs to be Nessie 0.18.0 I believe (the version that is being used in Iceberg 0.13.0), but worth trying if we can use Nessie 0.19.0

but version that is being used by iceberg is not having the latest nessie spark extensions due to our circular dependency. So, I think it cannot be 0.18.0 as demos need nessie spark extensions.

Answer 4 · 2022-02-04T12:54:54.000Z

maybe the flink demo could be updated to Flink 1.14.0 along the way due to apache/iceberg@2d4b0dd

Answer 5 · 2022-02-16T11:12:46.000Z

it was noticed that the demo links to https://projectnessie.org/tools/spark/ but it should be https://projectnessie.org/tools/iceberg/spark/

so we should also update that

Answer 6 · 2022-02-22T15:27:26.000Z

note to self: can use iceberg 0.13.1 with nessie 0.18.0
https://github.com/apache/iceberg/blob/apache-iceberg-0.13.1/versions.props

Answer 7 · 2022-02-22T16:41:39.000Z

note to @XN137 : You can try with nessie 0.18.0, but I think nessie spark session extensions were upgraded in 0.19.0 so it will not work.

There is a circular dependency,
Nessie makes a release (0.18.0)-> Iceberg makes a release with that nessie (0.13.1) -> nessie updates its spark extensions with that iceberg version (0.19.0)

If it works on 0.18.0, let me know also. I am also curious to see whether my understanding is right or not.

Answer 8 · 2022-03-07T17:28:39.000Z

You can try with nessie 0.18.0, but I think nessie spark session extensions were upgraded in 0.19.0 so it will not work.

If it works on 0.18.0, let me know also. I am also curious to see whether my understanding is right or not.

@ajantha-bhat i think you are right in that we need to use 0.19.0 in order to use iceberg 0.13.1

On 0.18.0 i was getting this error in the spark demo:

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
/tmp/ipykernel_345/2286535781.py in <module>
----> 1 spark.sql("CREATE BRANCH dev IN dev_catalog FROM main").toPandas()

(... noise cut out...)

Py4JJavaError: An error occurred while calling o42.sql.
: java.lang.NoClassDefFoundError: org/projectnessie/client/NessieClient
	at org.apache.spark.sql.execution.datasources.v2.NessieUtils$.nessieClient(NessieUtils.scala:183)
	at org.apache.spark.sql.execution.datasources.v2.NessieExec.run(NessieExec.scala:29)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:46)
	at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: org.projectnessie.client.NessieClient
	at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	... 31 more

and upgrading to 0.19.0 fixed it.
does this confirm your understanding?

is the circular dependency explained somewhere?
i can see https://github.com/projectnessie/nessie/blob/nessie-0.18.0/clients/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/NessieUtils.scala#L182-L185 referring to NessieClient even though
https://github.com/projectnessie/nessie/blob/nessie-0.18.0/clients/client/src/main/java/org/projectnessie/client/api/NessieApiV1.java already exists in that version.

but only in 0.19.0 the NessieApiV1 is being used in the spark extensions:
https://github.com/projectnessie/nessie/blob/nessie-0.19.0/clients/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/NessieUtils.scala#L178-L181

Answer 9 · 2022-03-07T17:47:30.000Z

@XN137 : Thanks for validating. So, my understanding is correct.

is the circular dependency explained somewhere?

we can update it. But only SQL extensions doesn't work, rest of the things works (like RESET API, JAVA API, CLI). Circular dependency is only for this module.
I think background is Iceberg don't want to keep this Nessie SQL extensions and its integration test. So, we are keeping it in Nessie with a circular dependency on Iceberg.