mrchristine/db-migration

Error when exporting metastore

Closed this issue · 7 comments

I am getting an error when trying to export metastore for a database with tables that point to files on Azure Storage account.

Here is the error.

Table: account_diagnosis_related_group_fact
post: https://eastus.azuredatabricks.net/api/1.2/commands/execute
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
ERROR:
com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token
Logging failure

Within a notebook, I can run the same statement that I find in the failed_metastore.log using the same spark cluster and I don't get an error.

Here is the python statement I can run successfully:

spark.sql("show create table dw.account_diagnosis_related_group_fact").collect()[0][0]

Out[2]: "CREATE TABLE dw.account_diagnosis_related_group_fact (\n account_diagnosis_related_group_dimension_key BIGINT,\n effective_from_date DATE,\n effective_to_date DATE,\n tenant_key BIGINT,\n account_dimension_key BIGINT,\n account_key BIGINT,\n diagnosis_related_group_dimension_key BIGINT,\n diagnosis_related_group_code STRING,\n relationship_type_code_key BIGINT,\n relationship_type_code STRING,\n source_code_key BIGINT,\n source_code STRING,\n diagnosis_related_group_condition_code_key BIGINT,\n diagnosis_related_group_condition_code STRING,\n diagnosis_related_group_condition_description STRING,\n diagnosis_related_group_length_of_stay_days_count INT,\n diagnosis_related_group_qualifier_code_key BIGINT,\n diagnosis_related_group_qualifier_code STRING,\n diagnosis_related_group_qualifier_description STRING,\n illness_severity_class_code_key BIGINT,\n illness_severity_class_code STRING,\n illness_severity_class_description STRING,\n mortality_risk_class_code_key BIGINT,\n mortality_risk_class_code STRING,\n mortality_risk_class_description STRING,\n arithmetic_average_length_of_stay DECIMAL(5,2),\n geometric_average_length_of_stay DECIMAL(5,2),\n relative_weighting_factor DECIMAL(18,4),\n diagnosis_related_group_sequence BIGINT,\n diagnosis_related_group_billing_indicator INT,\n account_diagnosis_related_group_count INT,\n document_key BIGINT,\n document_dimension_key BIGINT,\n diagnosis_related_group_comparison_indicator INT)\nUSING delta\nLOCATION 'abfss://lake@storlakeprod.dfs.core.windows.net/edw/prod/dw/account_diagnosis_related_group_fact.delta'\n"

Are you exporting from an existing cluster, or using the default cluster?
You can use this option to export from an existing cluster:

--cluster-name CLUSTER_NAME
                        Cluster name to export the metastore to a specific
                        cluster. Cluster will be started.

Yes, I am using the cluster name which is the same one I ran the statement on. Here is the export command:

python3 ./export_db.py --azure --metastore --skip-failed --database dw --cluster-name spark --profile bricks-workspace-qsrp

This cluster called "spark" and has credential passthrough enabled.

Instead of using the Databricks Personal Access Token, you'll need to use your AAD token instead since you're using passthrough.

The spark cluster is configured for passthrough credentials. I am not sure how to go about your proposal.
The token is stored in the .databrickscfg for that cluster which was created using the databricks cli. How can I get an AAD token and pass it to the export.py tool?

Can you work with your Databricks account team to help walk you through getting your AAD token?

Could you please be more clear please? Is this a configuration that needs to be changed on the Databricks cluster or is it a that I need to obtain and pass to the export tool? I think what you are referring to is authenticating using a service principal. Is this correct?

Here's a link on how to generate the AAD token.
https://documenter.getpostman.com/view/2644780/SzmcZe3a

I can't provide more details since I'm not familiar with how Azure + Passthrough are configured, hence the request to reach out to your Databricks account team for further assistance.