java.lang.NoSuchMethodError: org.apache.spark.sql.internal.SharedState.externalCatalog()Lorg/apache/spark/sql/catalyst/catalog/ExternalCatalog;
Opened this issue · 19 comments
Describe the bug
I am running standalone spark version 2.3 in an ec2-instance. And i have Hive standalone on the same instance, ranger-hive-plugin is setup and policies are working fine with hive connection.
I carefully followed your instructions to setup ranger for spark-sql. Only thing i did not perform is modifying ExpermientalMethods.scala presuming it is not required for testing.
Also, i built the spark-authorizer jar using "maven clean package -Pspark-2.3"
To Reproduce
Steps to reproduce the behavior:
- copied ranger*.xml conf files to spark_home/conf, copied ranger-hive*.jars to spark_home/jars along with spark-authorizer jar. gave full permissions to all xml and jar files. Modified conf xml files as advised.
- Environments are spark standalone, hive, ranger. All are on ec2.
- Tried running show databases command in spark-shell,
- See error
Anything i missed?
My final goal is that spark-sql will be used from different sql clients such as "squirrel sql client" or "cassandra" etc. And hive policies should be enforced when they query the data. All clients will connect to spark-sql using a string that looks like
jdbc:hive2://hostname:10015/databasename;ssl=true;sslTrustStore=/pathtofile.jks;trustStorePassword=abcd
sorry…master branch is not stable yet for most of cases I verified are agaist Spark 2.1.2.
I would love to have it fixed as soon as my vacation end. For now,you may switch to branch 2.3 or use the package i deployed for 2.3,more details in branch 2.3 readme
@NithK45 I have it tested by spark-shell on yarn using maven clean package -Pspark-2.1
and it works fine with spark 2.1.2/2.2.1/2.3.0. I don't have an standalone env for testing, but i guess you are using cluster mode and standalone seems don't have an module like 'Yarn Distribute Cache' for deploy jars, which may be manually copied to all work nodes.
@yaooqinn Thanks for testing that. I am using a ec2 large single node containing standalone spark 2.3, hive 2.4 setup. Data is in s3. We don't have yarn and hadoop setup. I copied your jar into spark_home/jars/ directory, restarted spark and hive services.
I also modified your code to comment out the validation that is causing the above error and rebuilt and tested, it did not throw the error this time but no enforcement of ranger policies happened from spark-shell.
I would also like to know whether you tested this by connecting to spark-sql through a SQL client (like Squirrel sql client or SQL developer tools) with a jdbc connection string?
For SQL clients, I use Kyuubi, which is a multi-tenant JDBC/ODBC Server powered by Spark SQL
Hi!
Same error here... NoSuchMethodError :-(. Is there any solution? Thanks!
@jacibreiro please try v2.1.1 and follow the doc https://yaooqinn.github.io/spark-authorizer/docs/install_plugin.html
@yaooqinn, thanks for you quick answer. I'm using v2.1.1 (builded from master branch)... I'm also using hive 2.3.2, spark 2.4 and ranger 1.2... Maybe the problem is the ranger version? Have you tested ranger versions higher than 0.5?
With pyspark shell I don't obtain the NoSuchMethodError, but still it doesn't work... I have followed all the steps of the manual but it seems that the plugin is not connecting with ranger (maybe because of the version issue I comment in the previous post). I think is not connecting because I can't see the policy cached. With hive there is a script to enable the plugin, but here I don't know when the communication between spark and ranger start...:-S Maybe is there any extra step that is not in the documentation?
By the way, this is my ranger-hive-security-xml:
<property>
<name>ranger.plugin.hive.policy.rest.url</name>
<value>http://ranger-admin:6080</value>
</property>
<property>
<name>ranger.plugin.hive.service.name</name>
<value>cl1_hive</value>
</property>
<property>
<name>ranger.plugin.hive.policy.cache.dir</name>
<value>/tmp/cl1_hive/policycache</value>
</property>
<property>
<name>ranger.plugin.hive.policy.pollIntervalMs</name>
<value>5000</value>
</property>
<property>
<name>ranger.plugin.hive.policy.source.impl</name>
<value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>
</property>
Do you see something wrong?
Thanks!
@jacibreiro you are right. Higher versions of ranger are built with higher hive client jars than spark(1.2.1). We may have it fixed in https://issues.apache.org/jira/browse/RANGER-2128 later
@yaooqinn I have used and old ranger version (0.5.3) but still doesn't work... I don't see anything either in the policy cache dir nor in the audit plugins sheet (in ranger). So it seems to be something related with communication between spark and ranger because is not able to load the policies. I have followed all the steps described in https://yaooqinn.github.io/spark-authorizer/docs/install_plugin.html . Watching my ranger-hive-security-xml (previous post) Do you miss something?
@jacibreiro could you please detail about "still doesn't work..."
sure @yaooqinn, I mean that I have followed every single step in the manual:
- install version 0.5.3 of ranger
- install and configure hive plugin in spark
- enable hive plugin in spark-defaults.conf
And when I query hive from spark using pyspark shell (following the instructions), nothing related with authorization happens. The behaviour is the same than before applying the plugin. policies are not applied.
Maybe you should check the Ranger Admin is reachable first and let's start with spark-sql script to see if there is anything went wrong.
I am getting the same error described in this thread.
Installation details:
- Spark 2.4.0 (I am using the built-in Thrift Server)
- Ranger 0.5.3
- spark-authorizer 2.1.1
When I had spark-authorizer to /jars folder or application dependencies I couldn't connect to thrift server. Even when I tested the connection in the Ranger Admin UI I got the same error:
Without spark-authorizer I can connect with any problem.
I followed your steps but maybe I am missing something... Do you have any advice?
Thanks!
Hi @yaooqinn
I am try it in spark.2.4, too. But it doesn't work out. I use superuser hadoop to use any databases, it always throw Permission error. And In spark.2.2, it work!
我这边测试是可以的 spark2.4.3 hive2.3.3 ranger1.1.0 把patch 合并进去就不报这个错误了