astronomer/astronomer-cosmos

Allow TestBehavior.AFTER_EACH with DBT_MANIFEST load method

Closed this issue · 1 comments

Context

When running cosmos with LoadMode.DBT_MANIFEST, I noticed that all tests would disappear when trying to filter via RenderConfig.select. The tests show up properly when rendering the full dbt project (removing the select filter), so I assumed this issue was related to the select/exclude logic. I tested this on my own project as well as with cosmos_manifest_example.py in this repo. Below are some screenshots of the behavior I observed with the example DAG in this repo.

With

render_config=RenderConfig(load_method=LoadMode.DBT_MANIFEST),

manifest_no_filtering

The tests show up properly here.

With

render_config=RenderConfig(load_method=LoadMode.DBT_MANIFEST, select=["+customers"]),
manifest_with_filtering

The tests do not show up once the select=["+customers"] is included in the RenderConfig.

Upon further investigation, it seems as though this issue stems from the fact that test nodes are not included in the depends_on field in manifest.json. The logic in selector.py builds the dbt graph based on this depends_on field and therefore winds up filtering out all test nodes so that they are not included in the filtered_nodes by this step in graph.py

            self.filtered_nodes = select_nodes(
                project_dir=self.execution_config.project_path,
                nodes=nodes,
                select=self.render_config.select,
                exclude=self.render_config.exclude,
            )

Solution
There's a few different ways we could update the logic to allow model filtering while maintaining the proper testing behavior.

  • update the update_node_dependency() method which gets called after the filtering process is completed. We can update the logic here to add any test nodes which are tests for a node in self.filtered_nodes.

  • update the resources dict to add all test nodes to the depends_on field of node(s) they are testing.

  • update the selector.py logic to account for the fact that test nodes do not show up in the depends_on field of nodes they are testing. This is probably the most complex solution as it would require updating the filtering logic itself. It seems the GraphSelector class is the one that would need updating.

I can put up a PR for option 1 but am open to any thoughts or suggestions on what makes the most sense. Also please let me know if there is an easier fix or workaround that I am missing. Thanks!

BTW I'm seeing a failure in the pre-commit mypy check even without any changes yet made on my feature branch. I noticed other PR's seem to skip that check, is that something I should be doing? Is there a way to skip it?

Hi @chris-okorodudu, thanks for reporting, suggesting fixes, and being available to fix this issue!

I believe (1) is a pragmatic solution to the problem; please go ahead, and we'll review whether there are any corner cases we may not be considering as a starting point.

I just logged an issue so we could fix the pre-commit issue; I hadn't realized we had this problem: #952
While this issue is not solved, you should be able to use SKIP=mypy-python:
https://pre-commit.com/#temporarily-disabling-hooks