Error when exporting metastore

Question

Error when exporting metastore

Closed this issue 4 years ago · 7 comments

I am getting an error when trying to export metastore for a database with tables that point to files on Azure Storage account.

Here is the error.

Table: account_diagnosis_related_group_fact
post: https://eastus.azuredatabricks.net/api/1.2/commands/execute
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
ERROR:
com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token
Logging failure

Within a notebook, I can run the same statement that I find in the failed_metastore.log using the same spark cluster and I don't get an error.

Here is the python statement I can run successfully:

spark.sql("show create table dw.account_diagnosis_related_group_fact").collect()[0][0]

Out[2]: "CREATE TABLE dw.account_diagnosis_related_group_fact (\n account_diagnosis_related_group_dimension_key BIGINT,\n effective_from_date DATE,\n effective_to_date DATE,\n tenant_key BIGINT,\n account_dimension_key BIGINT,\n account_key BIGINT,\n diagnosis_related_group_dimension_key BIGINT,\n diagnosis_related_group_code STRING,\n relationship_type_code_key BIGINT,\n relationship_type_code STRING,\n source_code_key BIGINT,\n source_code STRING,\n diagnosis_related_group_condition_code_key BIGINT,\n diagnosis_related_group_condition_code STRING,\n diagnosis_related_group_condition_description STRING,\n diagnosis_related_group_length_of_stay_days_count INT,\n diagnosis_related_group_qualifier_code_key BIGINT,\n diagnosis_related_group_qualifier_code STRING,\n diagnosis_related_group_qualifier_description STRING,\n illness_severity_class_code_key BIGINT,\n illness_severity_class_code STRING,\n illness_severity_class_description STRING,\n mortality_risk_class_code_key BIGINT,\n mortality_risk_class_code STRING,\n mortality_risk_class_description STRING,\n arithmetic_average_length_of_stay DECIMAL(5,2),\n geometric_average_length_of_stay DECIMAL(5,2),\n relative_weighting_factor DECIMAL(18,4),\n diagnosis_related_group_sequence BIGINT,\n diagnosis_related_group_billing_indicator INT,\n account_diagnosis_related_group_count INT,\n document_key BIGINT,\n document_dimension_key BIGINT,\n diagnosis_related_group_comparison_indicator INT)\nUSING delta\nLOCATION 'abfss://lake@storlakeprod.dfs.core.windows.net/edw/prod/dw/account_diagnosis_related_group_fact.delta'\n"

Answer 1 · 2020-10-19T15:45:18.000Z

Are you exporting from an existing cluster, or using the default cluster?
You can use this option to export from an existing cluster:

--cluster-name CLUSTER_NAME
                        Cluster name to export the metastore to a specific
                        cluster. Cluster will be started.

Answer 2 · 2020-10-19T17:12:06.000Z

Yes, I am using the cluster name which is the same one I ran the statement on. Here is the export command:

python3 ./export_db.py --azure --metastore --skip-failed --database dw --cluster-name spark --profile bricks-workspace-qsrp

This cluster called "spark" and has credential passthrough enabled.

Answer 3 · 2020-10-19T19:52:15.000Z

Instead of using the Databricks Personal Access Token, you'll need to use your AAD token instead since you're using passthrough.

Answer 4 · 2020-10-20T15:37:22.000Z

The spark cluster is configured for passthrough credentials. I am not sure how to go about your proposal.
The token is stored in the .databrickscfg for that cluster which was created using the databricks cli. How can I get an AAD token and pass it to the export.py tool?

Answer 5 · 2020-10-20T16:26:12.000Z

Can you work with your Databricks account team to help walk you through getting your AAD token?

Answer 6 · 2020-10-20T17:59:29.000Z

Could you please be more clear please? Is this a configuration that needs to be changed on the Databricks cluster or is it a that I need to obtain and pass to the export tool? I think what you are referring to is authenticating using a service principal. Is this correct?

Answer 7 · 2020-10-20T19:03:45.000Z

Here's a link on how to generate the AAD token.
https://documenter.getpostman.com/view/2644780/SzmcZe3a

I can't provide more details since I'm not familiar with how Azure + Passthrough are configured, hence the request to reach out to your Databricks account team for further assistance.