Error when exporting metastore
Closed this issue · 7 comments
I am getting an error when trying to export metastore for a database with tables that point to files on Azure Storage account.
Here is the error.
Table: account_diagnosis_related_group_fact
post: https://eastus.azuredatabricks.net/api/1.2/commands/execute
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
ERROR:
com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token
Logging failure
Within a notebook, I can run the same statement that I find in the failed_metastore.log using the same spark cluster and I don't get an error.
Here is the python statement I can run successfully:
spark.sql("show create table dw.account_diagnosis_related_group_fact").collect()[0][0]
Out[2]: "CREATE TABLE dw
.account_diagnosis_related_group_fact
(\n account_diagnosis_related_group_dimension_key
BIGINT,\n effective_from_date
DATE,\n effective_to_date
DATE,\n tenant_key
BIGINT,\n account_dimension_key
BIGINT,\n account_key
BIGINT,\n diagnosis_related_group_dimension_key
BIGINT,\n diagnosis_related_group_code
STRING,\n relationship_type_code_key
BIGINT,\n relationship_type_code
STRING,\n source_code_key
BIGINT,\n source_code
STRING,\n diagnosis_related_group_condition_code_key
BIGINT,\n diagnosis_related_group_condition_code
STRING,\n diagnosis_related_group_condition_description
STRING,\n diagnosis_related_group_length_of_stay_days_count
INT,\n diagnosis_related_group_qualifier_code_key
BIGINT,\n diagnosis_related_group_qualifier_code
STRING,\n diagnosis_related_group_qualifier_description
STRING,\n illness_severity_class_code_key
BIGINT,\n illness_severity_class_code
STRING,\n illness_severity_class_description
STRING,\n mortality_risk_class_code_key
BIGINT,\n mortality_risk_class_code
STRING,\n mortality_risk_class_description
STRING,\n arithmetic_average_length_of_stay
DECIMAL(5,2),\n geometric_average_length_of_stay
DECIMAL(5,2),\n relative_weighting_factor
DECIMAL(18,4),\n diagnosis_related_group_sequence
BIGINT,\n diagnosis_related_group_billing_indicator
INT,\n account_diagnosis_related_group_count
INT,\n document_key
BIGINT,\n document_dimension_key
BIGINT,\n diagnosis_related_group_comparison_indicator
INT)\nUSING delta\nLOCATION 'abfss://lake@storlakeprod.dfs.core.windows.net/edw/prod/dw/account_diagnosis_related_group_fact.delta'\n"
Are you exporting from an existing cluster, or using the default cluster?
You can use this option to export from an existing cluster:
--cluster-name CLUSTER_NAME
Cluster name to export the metastore to a specific
cluster. Cluster will be started.
Yes, I am using the cluster name which is the same one I ran the statement on. Here is the export command:
python3 ./export_db.py --azure --metastore --skip-failed --database dw --cluster-name spark --profile bricks-workspace-qsrp
This cluster called "spark" and has credential passthrough enabled.
Instead of using the Databricks Personal Access Token, you'll need to use your AAD token instead since you're using passthrough.
The spark cluster is configured for passthrough credentials. I am not sure how to go about your proposal.
The token is stored in the .databrickscfg for that cluster which was created using the databricks cli. How can I get an AAD token and pass it to the export.py tool?
Can you work with your Databricks account team to help walk you through getting your AAD token?
Could you please be more clear please? Is this a configuration that needs to be changed on the Databricks cluster or is it a that I need to obtain and pass to the export tool? I think what you are referring to is authenticating using a service principal. Is this correct?
Here's a link on how to generate the AAD token.
https://documenter.getpostman.com/view/2644780/SzmcZe3a
I can't provide more details since I'm not familiar with how Azure + Passthrough are configured, hence the request to reach out to your Databricks account team for further assistance.