dbt-labs/dbt-core

[Bug] Renaming or removing a contracted model should raise a BreakingChange warning/error

jtcohen6 opened this issue · 4 comments

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When a contracted model changes or disables its contract, dbt raises a warning (UnversionedBreakingChange) or error (ContractBreakingChangeError) within the same_contract check triggered by state:modified selection/comparison.

However, the same warning/error does not occur if the user simply deletes or renames a contracted model. I believe it should — until/unless unless that contracted model has passed a defined deprecation_date, at which point deletion is allowed.

Expected Behavior

If a contracted model is present in the comparison manifest, and missing in the current manifest, raise a warning (if unversioned) or error (if versioned) accordingly. Add a new type of "breaking change" for "renamed or removed."

Naïve implementation for illustrative purposes only:

class StateSelectorMethod(SelectorMethod):
    ...
    def check_for_deleted_contracted_models(self) -> None:
        old_contracted_nodes = set(
            k for k, v in self.previous_state.manifest.nodes.items() if v.config.contract.enforced
        )
        new_contracted_nodes = set(
            k for k, v in self.manifest.nodes.items() if v.config.contract.enforced
        )
        for unique_id in old_contracted_nodes - new_contracted_nodes:
            node = self.previous_state.manifest.nodes[unique_id]
            if (
                node.deprecation_date
                and node.deprecation_date < datetime.datetime.now().astimezone()
            ):
                # Passed its deprecation_date, so deletion is allowed
                continue
            if node.version is None:
                print("WARNING! Unversioned contracted model renamed or removed")
                # This should be warn_or_error(UnversionedBreakingChange), and we probably want a new "breaking_change" reason
            else:
                print("ERROR! Versioned contracted model renamed or removed")
                # This should raise ContractBreakingChangeError
    ...
    def search(self, included_nodes: Set[UniqueId], selector: str) -> Iterator[UniqueId]:
        if self.previous_state is None or self.previous_state.manifest is None:
            raise DbtRuntimeError("Got a state selector method, but no comparison manifest")
        self.check_for_deleted_contracted_models()
        ...

Steps To Reproduce

  1. Create a model configured with contract: {enforced: true}
  2. Parse the project / run the model
  3. Move the manifest to a folder named state/
  4. Delete or rename the model
  5. dbt list -s state:modified --state
  6. No warning or error

Relevant log output

No response

Environment

- OS: macOS
- Python: 3.10.11
- dbt: v1.8 / main

Which database adapter are you using with dbt?

No response

Additional Context

No response

@MichelleArk link an example of the current implementation for similar errors in the contracted model.

The existing call sites for UnversionedBreakingChange and ContractBreakingChangeError are in same_contents:

def same_contract(self, old, adapter_type=None) -> bool:
# If the contract wasn't previously enforced:
if old.contract.enforced is False and self.contract.enforced is False:
# No change -- same_contract: True
return True
if old.contract.enforced is False and self.contract.enforced is True:
# Now it's enforced. This is a change, but not a breaking change -- same_contract: False
return False
# Otherwise: The contract was previously enforced, and we need to check for changes.
# Happy path: The contract is still being enforced, and the checksums are identical.
if self.contract.enforced is True and self.contract.checksum == old.contract.checksum:
# No change -- same_contract: True
return True
# Otherwise: There has been a change.
# We need to determine if it is a **breaking** change.
# These are the categories of breaking changes:
contract_enforced_disabled: bool = False
columns_removed: List[str] = []
column_type_changes: List[Dict[str, str]] = []
enforced_column_constraint_removed: List[
Dict[str, str]
] = [] # column_name, constraint_type
enforced_model_constraint_removed: List[Dict[str, Any]] = [] # constraint_type, columns
materialization_changed: List[str] = []
if old.contract.enforced is True and self.contract.enforced is False:
# Breaking change: the contract was previously enforced, and it no longer is
contract_enforced_disabled = True
# TODO: this avoid the circular imports but isn't ideal
from dbt.adapters.base import ConstraintSupport
from dbt.adapters.factory import get_adapter_constraint_support
constraint_support = get_adapter_constraint_support(adapter_type)
column_constraints_exist = False
# Next, compare each column from the previous contract (old.columns)
for old_key, old_value in sorted(old.columns.items()):
# Has this column been removed?
if old_key not in self.columns.keys():
columns_removed.append(old_value.name)
# Has this column's data type changed?
elif old_value.data_type != self.columns[old_key].data_type:
column_type_changes.append(
{
"column_name": str(old_value.name),
"previous_column_type": str(old_value.data_type),
"current_column_type": str(self.columns[old_key].data_type),
}
)
# track if there are any column level constraints for the materialization check late
if old_value.constraints:
column_constraints_exist = True
# Have enforced columns level constraints changed?
# Constraints are only enforced for table and incremental materializations.
# We only really care if the old node was one of those materializations for breaking changes
if (
old_key in self.columns.keys()
and old_value.constraints != self.columns[old_key].constraints
and old.materialization_enforces_constraints
):
for old_constraint in old_value.constraints:
if (
old_constraint not in self.columns[old_key].constraints
and constraint_support[old_constraint.type] == ConstraintSupport.ENFORCED
):
enforced_column_constraint_removed.append(
{
"column_name": old_key,
"constraint_name": old_constraint.name,
"constraint_type": ConstraintType(old_constraint.type),
}
)
# Now compare the model level constraints
if old.constraints != self.constraints and old.materialization_enforces_constraints:
for old_constraint in old.constraints:
if (
old_constraint not in self.constraints
and constraint_support[old_constraint.type] == ConstraintSupport.ENFORCED
):
enforced_model_constraint_removed.append(
{
"constraint_name": old_constraint.name,
"constraint_type": ConstraintType(old_constraint.type),
"columns": old_constraint.columns,
}
)
# Check for relevant materialization changes.
if (
old.materialization_enforces_constraints
and not self.materialization_enforces_constraints
and (old.constraints or column_constraints_exist)
):
materialization_changed = [old.config.materialized, self.config.materialized]
# If a column has been added, it will be missing in the old.columns, and present in self.columns
# That's a change (caught by the different checksums), but not a breaking change
# Did we find any changes that we consider breaking? If there's an enforced contract, that's
# a warning unless the model is versioned, then it's an error.
if (
contract_enforced_disabled
or columns_removed
or column_type_changes
or enforced_model_constraint_removed
or enforced_column_constraint_removed
or materialization_changed
):
breaking_changes = []
if contract_enforced_disabled:
breaking_changes.append(
"Contract enforcement was removed: Previously, this model had an enforced contract. It is no longer configured to enforce its contract, and this is a breaking change."
)
if columns_removed:
columns_removed_str = "\n - ".join(columns_removed)
breaking_changes.append(f"Columns were removed: \n - {columns_removed_str}")
if column_type_changes:
column_type_changes_str = "\n - ".join(
[
f"{c['column_name']} ({c['previous_column_type']} -> {c['current_column_type']})"
for c in column_type_changes
]
)
breaking_changes.append(
f"Columns with data_type changes: \n - {column_type_changes_str}"
)
if enforced_column_constraint_removed:
column_constraint_changes_str = "\n - ".join(
[
f"'{c['constraint_name'] if c['constraint_name'] is not None else c['constraint_type']}' constraint on column {c['column_name']}"
for c in enforced_column_constraint_removed
]
)
breaking_changes.append(
f"Enforced column level constraints were removed: \n - {column_constraint_changes_str}"
)
if enforced_model_constraint_removed:
model_constraint_changes_str = "\n - ".join(
[
f"'{c['constraint_name'] if c['constraint_name'] is not None else c['constraint_type']}' constraint on columns {c['columns']}"
for c in enforced_model_constraint_removed
]
)
breaking_changes.append(
f"Enforced model level constraints were removed: \n - {model_constraint_changes_str}"
)
if materialization_changed:
materialization_changes_str = (
f"{materialization_changed[0]} -> {materialization_changed[1]}"
)
breaking_changes.append(
f"Materialization changed with enforced constraints: \n - {materialization_changes_str}"
)
if self.version is None:
warn_or_error(
UnversionedBreakingChange(
contract_enforced_disabled=contract_enforced_disabled,
columns_removed=columns_removed,
column_type_changes=column_type_changes,
enforced_column_constraint_removed=enforced_column_constraint_removed,
enforced_model_constraint_removed=enforced_model_constraint_removed,
breaking_changes=breaking_changes,
model_name=self.name,
model_file_path=self.original_file_path,
),
node=self,
)
else:
raise (
ContractBreakingChangeError(
breaking_changes=breaking_changes,
node=self,
)
)

However, in the case of a model that's been renamed/removed, we can't rely on same_contents, because that is called once for each node within the current project.

Acceptance criteria

  • Add new attribute (renamed_or_removed: bool) to UnversionedBreakingChange
  • Add check for old nodes missing in current manifest to state:modified (naïve example above). Depending on whether the node is versioned, and whether it's passed its deprecation_date, fire UnversionedBreakingChange event or raise ContractBreakingChangeError error. These should include "The model has been renamed or removed" within their list of breaking_changes.
  • Add tests

Opened a new issue in dbt-labs/docs.getdbt.com: dbt-labs/docs.getdbt.com#5574