Adding support for Database in Manifest Json
Opened this issue · 3 comments
Describe the feature
Sometimes users need to have a reference to the database in manifest.json, in order to feed the manifest in 3rd party tools.
The Database category is not used in Teradata since the equivalent is schema, in the manifest the database is populated as Null.
The request from users is that if the Database is defined in the configurations, then the database is populated, this only for documentation matters.
For generating the database artifacts, only the schema will continue to be relevant, as it is right now.
Describe alternatives you've considered
Not applicable.
Additional context
This is only related to documentation and manifest file.
Who will this benefit?
Users that feed the manifest in other 3rd party tools.
Are you interested in contributing this feature?
I could.
I had this issue and posted in the chat in Slack and wasn't able to ever get it working.
https://getdbt.slack.com/archives/C027B6BHMT3/p1680535202992639
here was the response from the teradata person in the chat:
Hi there, since Teradata doesn’t implement the database -> schema hierarchy and has only databases instead, we had to decide how to map the database -> schema hierarchy to the Teradata world. The decision was to use the schema (as it’s part of the fully qualified relation name in dbt). When you configure dbt-teradata, you have two options:
Leave database field blank (translates to null).
Set database field to the schema name.
It sounds like you use #1, and the thing your are trying to do doesn’t like it. Try approach #2 and see if that works better with the export step.
@Austin1 Thanks for the feedback. Actually I opened this issue to give the team visibility on the topic you noticed in Slack.
An update to the codebase is needed to enable the steps above to attain the desired objective.
I made modifications to the dbt-teradata codebase to assign the schema value to the database name by adjusting the dbt/include/teradata/macros/adapters.sql file as follows:
{% macro teradata__generate_database_name(custom_database_name=none, node=none) -%}
{%- set schema_name = node.schema -%}
{{ schema_name }}
{%- endmacro %}
This change enables the schema name to be returned whenever dbt-core attempts to retrieve the database name, successfully adding the database name to the manifest.json file instead of returning None.
Issue with catalog.json
However, I encountered an issue with the generation of the catalog.json file when running the dbt docs generate command. This problem is linked to the TeradataIncludePolicy in the dbt/adapters/teradata/relation.py file:
class TeradataQuotePolicy(Policy):
database: bool = False
schema: bool = True
identifier: bool = True
According to this policy, dbt-core renders the list of nodes without including the database name in path_part, resulting in an empty list of nodes in the catalog.json file.
Consideration of Teradata's Schema Implementation
As Teradata does not implement a database -> schema hierarchy and instead relies on databases directly, we have chosen to use the schema as part of the fully qualified relation name in dbt. Altering the policy to set the database flag to True would cause all SQL references to adopt the hierarchy database_name.schema_name.relation_name, which is not valid in Teradata.
Impact on Integration with Other Tools
It's also important to consider the potential impact on dbt-teradata integrations with other tools. These tools rely on the manifest.json and catalog.json files to understand the dbt project structure (For example dagster dbt). If we add the database name to these files, other integration tools might mistakenly interpret the hierarchy as database_name.schema_name.relation_name, leading to further complications.
Conclusion
Given these observations, it seems prudent to adhere to the existing TeradataIncludePolicy and allow the database name to remain None in both the manifest.json and catalog.json files.