PXF - gs:parquet - Can't get Master Kerberos principal for use as renewer
mpospis opened this issue · 3 comments
Hello,
When we try to query external table using gs:parquet
profile after querying another external table using Hive
profile with Kerberos authentication, the statement fails by reporting the following PXF server error:
[08000] ERROR: PXF server error : Can't get Master Kerberos principal for use as renewer
However after restarting PXF, a query selecting data from the external table with gs:parquet
profile finishes successfully, but only until there's an external table with Hive
profile queried. Attached pxf-service.log file captures DEBUG output with the sequence of events as follows:
- start PXF
- query external table
adhoc.ext_test_gs
pointing to 'folder' in GS usinggs:parquet
profile - finishes successfully - query external table
adhoc.ext_test_hive
pointing to Hive table in a cluster with Kerberos auth enabled usingHive
profile - finishes successfully - query external table
adhoc.ext_test_gs
pointing to 'folder' in GS usinggs:parquet
profile - fails withPXF server error : Can't get Master Kerberos principal for use as renewer
The question is if this is a bug or misconfiguration issue?
Many thanks!
EDIT: Workaround - removing underscores from Google Cloud Storage bucket name has resolved the issue in our case, e.g. by changing its name from 'project-id-some_name' to 'project-id-some-name'.
Greenplum Version: 6.22.2
PXF version: 6.4.2
gs-site.xml:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>google.cloud.auth.service.account.enable</name>
<value>true</value>
</property>
<property>
<name>google.cloud.auth.service.account.json.keyfile</name>
<value>/home/gpadmin/keys/*redacted*</value>
</property>
<property>
<name>fs.AbstractFileSystem.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
</property>
</configuration>
pxf-env.sh:
#!/bin/bash
##############################################################################
# This file contains PXF properties that can be specified by users #
# to customize their deployments. This file is sourced by PXF Server control #
# scripts upon initialization, start and stop of the PXF Server. #
# #
# To update a property, uncomment the line and provide a new value. #
##############################################################################
# Path to JAVA
# export JAVA_HOME=/opt/jdk11
# Path to Log directory
# export PXF_LOGDIR="${PXF_BASE}/logs"
# Path to Run directory
# export PXF_RUNDIR=${PXF_RUNDIR:=${PXF_BASE}/run}
# Memory
# export PXF_JVM_OPTS="-Xmx2g -Xms1g"
# Kill PXF on OutOfMemoryError, set to false to disable
# export PXF_OOM_KILL=true
# Dump heap on OutOfMemoryError, set to dump path to enable
# export PXF_OOM_DUMP_PATH=${PXF_BASE}/run/pxf_heap_dump
# Additional locations to be class-loaded by PXF
# export PXF_LOADER_PATH=
# Additional native libraries to be loaded by PXF
# export LD_LIBRARY_PATH=
######################################################
# The properties below were added by the pxf migrate
# tool on Fri Jun 18 17:50:49 CEST 2021
######################################################
# export JAVA_HOME="/usr/lib/jvm/java-openjdk/jre"
export JAVA_HOME="/opt/jdk11"
export PXF_JVM_OPTS="-Xmx8g -Xms2g -Dlog4j2.formatMsgNoLookups=true -Duser.timezone=UTC"
adhoc.ext_test_gs:
CREATE EXTERNAL TABLE adhoc.ext_test_gs
(
*redacted|columns*
)
LOCATION ('pxf://*redacted|gs_bucket*/*redacted|gs_object*?PROFILE=gs:parquet&SERVER=*redacted|gs_server*')
FORMAT 'CUSTOM' (FORMATTER = 'pxfwritable_import');
adhoc.ext_test_hive:
CREATE EXTERNAL TABLE adhoc.ext_test_hive
(
*redacted|columns*
)
LOCATION ('pxf://*redacted|schema*.*redacted|table*?PROFILE=Hive&SERVER=*redacted|hive_server*')
FORMAT 'custom' (FORMATTER = 'pxfwritable_import');
Thank you for reporting this issue and providing detailed information. I just attempted a similar scenario in a development/testing environment but I was not able to reproduce the error. One thing I did notice in the log file you provided is that the Hive table is EXTERNAL
and not MANAGED
. What is the (redacted) location in the tables definition? Does this error happen if you query a MANAGED
Hive table instead? Does this error occur if you use hdfs:parquet
to read the parquet-formatted file directly?
Thank you for reporting this issue and providing detailed information. I just attempted a similar scenario in a development/testing environment but I was not able to reproduce the error. One thing I did notice in the log file you provided is that the Hive table is
EXTERNAL
and notMANAGED
. What is the (redacted) location in the tables definition? Does this error happen if you query aMANAGED
Hive table instead? Does this error occur if you usehdfs:parquet
to read the parquet-formatted file directly?
Hi Bradford, Thank you for looking into this case. Ad your questions - I only had a chance to test with EXTERNAL
Hive table since we don't use MANAGED
in our environments. However I also tried querying via hdfs:parquet
profile at first and the same error was reported again when I queried data in Cloud Storage afterwards (via adhoc.ext_test_gs
external table).
But I think I found a workaround, at least for our use case. I've figured that if the GCS bucket name contains underscores, PXF server error : Can't get Master Kerberos principal for use as renewer
error is always reported after querying Hive/HDFS previously. In our case the bucket was named after 'project-id-some_name'. My apologies that this fact is hidden, resp. not obvious due to redacted location (*redacted|gs_bucket*
) I provided. After I got rid of underscores, i.e. named the bucket 'project-id-some-name', the error is gone and querying data in Cloud Storage using gs:parquet
profile works.
One more interesting thing is that this doesn't seem to affect EXTERNAL WRITABLE tables with gs:parquet
profile, because it worked for me even if bucket name contained underscores.
I was able to reproduce the error in dev/testing environment by using a bucket with an underscore in its name (e.g., <project-id>-bradford_scratch
). I've added a bug to our team's internal tracker to investigate further and find a root cause for this error.