yaooqinn/spark-authorizer

Please suggest: Where do we need to create --proxy-user? is it required to be created in Ranger UI only or under hadoop as well?

RamSinha opened this issue · 10 comments

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Configurations
  2. Environments
  3. Operations
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

so i just created an user in hadoop side and one user with same name is created in ranger. But the policies enforced in ranger aren't being applied.
I am kinda of confused about how the proxy-user is recognized by ranger. whats the logical mapping between two users.

you can either user proxy user or login user, if you specify --proxy-user UserA , the runtime sparkUser will be UserA, otherwise it will use the user part of spark.yarn.principal configuration. If you are using other authentication method, just pay attention to the value of SparkContext.sparkUser

Thanks for reply.
In our case we are running spark on AWS EMR, spark.yarn.principal configuration is not set anywhere.

scala> spark.conf.get("spark.yarn.principal") java.util.NoSuchElementException: spark.yarn.principal at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1992) at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1992) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.internal.SQLConf.getConfString(SQLConf.scala:1992) at org.apache.spark.sql.RuntimeConfig.get(RuntimeConfig.scala:74) ... 54 elided

But the setting is not able to enforce ranger policies for this user.
We have enable all the xml settings as mentioned in the blog.
and using below command to start shell.
spark-shell --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://127.0.0.1:10000/default" --driver-java-options="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -Dlog4j.configuration=file:/home/hadoop/optimus-log4j.properties" --jars /home/hadoop/spark-authorizer-2.2.0.jar --proxy-user ram

Any pointer would be really helpful.

Thanks for the pointer, BTW we are already using above installation guidelines.
Just one update:
We are on below versions.
spark 2.4 hive 2.3.4 ranger 0.7.1

Would it cause any problems?

for spark 2.4 #14
for hive 2.3.4 ensure the built in hive metastore client of spark has no incompatibility issues with it
for ranger 0.7.1 ,you may need to fix incompatibility issues in the ranger-hive-plugin module

Thanks for the pointers, i am looking the ranger-hive-plugin module now.
One question: In the setup document mentioned above there is no mention of ranger-hive-plugin installation. Does that mean- we don't need to install ranger-hive-plugin-module separately?

Just follow the doc's section 2 Applying Plugin to Apache Spark

Still no luck.
I build new EMR for spark2.3 now and followed all the instruction. Though i don't see any error but still the policies aren't being applied. Also on the ranger UI under audit section i don't see anything.

Also: When i tried installing ranger-hive-plugin on different EMR using below link
Ranger+Installation+Guide
Policies are being enforced on hive queries (started from hive cli).

When i try to build locally i get below warning.
Authorizable.scala:57: fruitless type test: a value of type org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener cannot also be a org.apache.spark.sql.hive.HiveExternalCatalog [WARNING] case _: HiveExternalCatalog =>

Also when i tried to run the same code from the spark-shell i get below error.
error: not found: type HiveExternalCatalog method.invoke(externalCatalog).asInstanceOf[HiveExternalCatalog]

Seems like from spark-shell its not able to access package private class HiveExternalCatalog