mrchristine/db-migration

Import metastore extract onto external SQL server.

Closed this issue · 8 comments

Hi Team,

Kindly let us know if there is a way to dump the extracted metadata of Databricks workspace onto the SQL data base.

Is there any utility just like how we imported and exported the metastore using python script?

What specific metadata of a Databricks workspace are you looking for?
There's many areas of a Databricks workspace, which this tools exports via supported APIs.

Are you looking to recreate a metastore backed by SQL DB, or are you looking to build a metadata service that you can query via SQL DB?

The tool uses the SparkSQL client to pull metadata info from the HiveMetastoreClient class, and is meant to interface via Spark. There's no direct way to import the metastore into another backing database. The best way to do that would be set up another Spark cluster to that metastore, and import the tables specifically to that cluster.

I could add an option to the import phase to attach to a specific cluster name if provided, but you would need to ensure the metastore is provisioned correctly.

I'll take some time to add it next week. Thanks for the feedback.

@raknaik I've added support for --cluster-name as an import option so you can connect to a separate cluster and run the import.

Please test it out and let me know how it works for you. I'll close this issue once you report back.

@mrchristine : Can we use the same parameter --cluster-name while we export the metadata ? (ddl's) . As of now a cluster named API_Metastore_Work_Leave_Me_Alone gets created which does the process. If we can use --cluster-name , then i would prefer to use an existing interactive cluster to do the job for me.

@arjun-hareendran the --cluster-name export metadata now works.