Error when running metorikku-standalone.jar: DataNucleus not found
Closed this issue · 7 comments
Hello,
I'm running metorikku-standalone.jar with hoodie-spark-bundle-0.4.7.jar locally and I'm getting the following error:
NucleusUserException: Persistence process has been specified to use a ClassLoaderResolver of name "datanucleus" yet this has not been found by the DataNucleus plugin mechanism. Please check your CLASSPATH and plugin specification.
Can you please help?
Thank you
Hi,
Can you send here the full command you are using? and the the job file?
Thanks
Hi lyogev,
Thank you for getting back to me.
The metric file I'm using is exactly like this one:
metric.yaml: https://raw.githubusercontent.com/YotpoLtd/metorikku/master/examples/kafka/kafka2hudi_cdc.yaml
And the config file is exactly like this:
config.yaml: https://raw.githubusercontent.com/YotpoLtd/metorikku/master/examples/kafka/kafka_example_cdc.yaml
The only difference in the config file is I replaced kafka, schema-registry and hive with localhost.
- localhost:9092
- localhost:8081
- localhost:10000
And here is the full command:
java -Dspark.master=local[*] -Dspark.serializer=org.apache.spark.serializer.KryoSerializer -cp hadoop-aws-2.7.5.jar -cp aws-java-sdk-1.7.4.jar -cp hoodie-spark-bundle-0.4.7.jar -cp metorikku-standalone.jar com.yotpo.metorikku.Metorikku -c config.yaml
Thank you
Can you add this jar to the classpath as well?
https://repo1.maven.org/maven2/org/datanucleus/datanucleus-core/3.2.10/datanucleus-core-3.2.10.jar
It's not being packaged with assembled spark
Hi lyogev,
After adding couple of datanucleus libraries that error went away. I can see data written to the output now. Thank you.
However, I'm getting a crash at a later point.
The error is: ERROR MicroBatchExecution: Query
Failed to get update last commit time synced
NoSuchObjectException(message:default.hoodie_test table not found)
Have you seen this error before? I'm trying to figure it out.
Also, I'm not passing hoodie-spark-bundle-0.4.7.jar to the class path. I believe is included into the standalone jar file.
Command:
java -Dspark.master=local[*] -Dspark.serializer=org.apache.spark.serializer.KryoSerializer -cp datanucleus-core-3.2.10.jar:datanucleus-api-jdo-3.2.8.jar:datanucleus-rdbms-3.2.9.jar:metorikku-standalone.jar com.yotpo.metorikku.Metorikku -c config.yaml
Thank you
Regarding hoodie, yes, it's not needed in the classpath, it's bundled already (in the standalone version).
Regarding the error, what you are missing here is hive configuration, hoodie is heavily coupled to hive, you need spark to write to hive and you need hoodie to write to hive.
So first your spark command needs to be aware of hive:
-Dspark.hadoop.hive.metastore.uris=thrift://hive:9083 -Dspark.sql.catalogImplementation=hive
Then in the job file make sure hoodie output is configured with:
hiveJDBCURL: jdbc:hive2://hive:10000
As you see spark communicates directly to hive metastore, and hoodie communicates with hive server.
I think you can skip the hive sync simply by omitting the tableName
from the hive output in the metric file, but I never tested this.
If that doesn't work, omit tableName
and add:
extraOptions:
hoodie.table.name: test
Hi lyogev,
That was it. I simply added those 2 extra configs to the command and it worked.
Command:
java -Dspark.master=local[*] -Dspark.serializer=org.apache.spark.serializer.KryoSerializer -Dspark.hadoop.hive.metastore.uris=thrift://localhost:9083 -Dspark.sql.catalogImplementation=hive -cp datanucleus-core-3.2.10.jar:datanucleus-api-jdo-3.2.8.jar:datanucleus-rdbms-3.2.9.jar:metorikku-standalone.jar com.yotpo.metorikku.Metorikku -c config.yaml
I knew it had something to do with that config value. I was modifying hive.metastore.uris in hive-site.xml all day, but never worked. After seeing your suggestion, I added to the command and it worked. I'm not sure why it wasn't able to get the value from hive-site.xml.
Thank you so much.
To use hive-site files etc., you need to use the regular metorikku version, not the standalone and use spark-submit, then it relates to the config folders.