apache/kyuubi

[Bug] [spark-hive-connector] failed to setting hive.metastore.uri if not setting `spark.sql.hive.metastore.jars`

FANNG1 opened this issue · 5 comments

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

  1. start two hive metastore, 127.0.0.1:9083 as hive1 and 127.0.0.1:19083 as hive2
  2. start spark SQL client, setting default hive metastore address to hive2, and the hive metastore address of hive_catalog to hive1
./bin/spark-sql -v \
--conf spark.sql.catalog.hive_catalog="org.apache.kyuubi.spark.connector.hive.HiveTableCatalog" \
--conf spark.sql.catalog.hive_catalog.hive.metastore.uris=thrift://127.0.0.1:9083 \
--conf spark.sql.catalog.hive_catalog.hive.metastore.port=9083 \
--conf spark.hadoop.hive.metastore.uris=thrift://127.0.0.1:19083 
  1. run spark sqls, after using hive_catalog, show databases retrives the database from hive2 not hive1.

Affects Version(s)

1.8.1

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.

DON'T use spark-sql to test the hive-related stuff, there are a lot of tricky inside, is it reproducible with spark-shell?

The main reason is if not specify spark.sql.hive.metastore.jars, HiveClientImpl will use a shared SessionState to create a Hive client, and the shared SessionState is initiated by spark_catalog catalog first in which the hive client is hive2 in this case.

  // `isolationOn` is true  if `spark.sql.hive.metastore.jars` is `buildin` and SessionState is CliSessionState
  def isCliSessionState(): Boolean = {
    val state = SessionState.get
    var temp: Class[_] = if (state != null) state.getClass else null
    var found = false
    while (temp != null && !found) {
      found = temp.getName == "org.apache.hadoop.hive.cli.CliSessionState"
      temp = temp.getSuperclass
    }
    found
  }


  // create or reuse session state according to the `clientLoader.isolationOn`
  val state: SessionState = {
    if (clientLoader.isolationOn) {
        newState()
    } else {
       SessionState.get
    }
  }

 // get conf from session state
  def conf: HiveConf = {
    val hiveConf = state.getConf
  }

 // create Hive client from conf
  private def client: Hive = {
    if (clientLoader.cachedHive != null) {
      clientLoader.cachedHive.asInstanceOf[Hive]
    } else {
      val c = getHive(conf)
      clientLoader.cachedHive = c
      c
    }
  }

DON'T use spark-sql to test the hive-related stuff, there are a lot of tricky inside, is it reproducible with spark-shell

It can't be reproduced with spark-shell

KSHC does not work well with spark-sql is a known issue, we don't have a plan to fix it on the Kyuubi side, because we treat it as a Spark side issue.

Kyuubi is a full drop-in replacement of spark-sql.

spark-sql ==> beeline => kyuubi => spark driver (client or cluster mode)

Close as not planned