lightdash/dbt2looker

Non-empty models cannot be parsed and are reported as empty

Tbodewes opened this issue · 2 comments

As of version 0.9.2, dbt2looker will not run for us anymore. v0.7.0 does run successfully. The error returned by 0.9.2 is 'Cannot parse model with id: "%s" - is the model file empty?'. However, the model that this is returned for is not empty. Based on the code, it seems like the attribute 'name' is missing, but inspecting the manifest.json file shows that there is actually a name for this model. I have no idea why the system reports these models as empty. The manifest.json object for one of the offending models is pasted below.

Reverting to v0.9.0 (which does not yet have this error message) just leads to dbt2looker crashing without any information. Reverting to 0.7.0 fixes the problem. This issue effectively locks us (and likely others) into using an old version of dbt2looker

"model.zivver_dwh.crm_account_became_customer_dates":
        {
            "raw_sql": "WITH sfdc_accounts AS (\r\n\r\n    SELECT * FROM {{ ref('stg_sfdc_accounts') }}\r\n\r\n), crm_opportunities AS (\r\n\r\n    SELECT * FROM {{ ref('crm_opportunities') }}\r\n\r\n), crm_account_lifecycle_stage_changes_into_customer_observed AS (\r\n\r\n    SELECT\r\n        *\r\n    FROM {{ ref('crm_account_lifecycle_stage_changes_observed') }}\r\n    WHERE\r\n        new_stage = 'CUSTOMER'\r\n\r\n), became_customer_dates_from_opportunities AS (\r\n\r\n    SELECT\r\n        crm_account_id AS sfdc_account_id,\r\n\r\n        -- An account might have multiple opportunities. The account became customer when the first one was closed won.\r\n        MIN(closed_at) AS became_customer_at\r\n    FROM crm_opportunities\r\n    WHERE\r\n        opportunity_stage = 'CLOSED_WON'\r\n    GROUP BY\r\n        1\r\n\r\n), became_customer_dates_observed AS (\r\n\r\n    -- Some accounts might not have closed won opportunities, but still be a customer. Examples would be Connect4Care\r\n    -- customers, which have a single opportunity which applies to multiple accounts. If an account is manually set\r\n    -- to customer, this should also count as a customer.\r\n    --\r\n    -- We try to get the date at which they became a customer from the property history. Since that wasn't on from\r\n    -- the beginning, we conservatively default to either the creation date of the account or the history tracking\r\n    -- start date, whichever was earlier. Please note that this case should be exceedingly rare.\r\n    SELECT\r\n        sfdc_accounts.sfdc_account_id,\r\n        CASE\r\n            WHEN {{ var('date:sfdc:account_history_tracking:start_date') }} <= sfdc_accounts.created_at\r\n                THEN sfdc_accounts.created_at\r\n            ELSE {{ var('date:sfdc:account_history_tracking:start_date') }}\r\n        END AS default_became_customer_date,\r\n\r\n        COALESCE(\r\n            MIN(crm_account_lifecycle_stage_changes_into_customer_observed.new_stage_entered_at),\r\n            default_became_customer_date\r\n        ) AS became_customer_at\r\n\r\n    FROM sfdc_accounts\r\n    LEFT JOIN crm_account_lifecycle_stage_changes_into_customer_observed\r\n        ON sfdc_accounts.sfdc_account_id = crm_account_lifecycle_stage_changes_into_customer_observed.sfdc_account_id\r\n    WHERE\r\n        sfdc_accounts.lifecycle_stage = 'CUSTOMER'\r\n    GROUP BY\r\n        1,\r\n        2\r\n\r\n)\r\nSELECT\r\n    COALESCE(became_customer_dates_from_opportunities.sfdc_account_id,\r\n        became_customer_dates_observed.sfdc_account_id) AS sfdc_account_id,\r\n    COALESCE(became_customer_dates_from_opportunities.became_customer_at,\r\n        became_customer_dates_observed.became_customer_at) AS became_customer_at\r\nFROM became_customer_dates_from_opportunities\r\nFULL OUTER JOIN became_customer_dates_observed\r\n    ON became_customer_dates_from_opportunities.sfdc_account_id = became_customer_dates_observed.sfdc_account_id",
            "resource_type": "model",
            "depends_on":
            {
                "macros":
                [
                    "macro.zivver_dwh.ref",
                    "macro.zivver_dwh.audit_model_deployment_started",
                    "macro.zivver_dwh.audit_model_deployment_completed",
                    "macro.zivver_dwh.grant_read_rights_to_role"
                ],
                "nodes":
                [
                    "model.zivver_dwh.stg_sfdc_accounts",
                    "model.zivver_dwh.crm_opportunities",
                    "model.zivver_dwh.crm_account_lifecycle_stage_changes_observed"
                ]
            },
            "config":
            {
                "enabled": true,
                "materialized": "ephemeral",
                "persist_docs":
                {},
                "vars":
                {},
                "quoting":
                {},
                "column_types":
                {},
                "alias": null,
                "schema": "bl",
                "database": null,
                "tags":
                [
                    "business_layer",
                    "commercial"
                ],
                "full_refresh": null,
                "crm_record_types": null,
                "post-hook":
                [
                    {
                        "sql": "{{ audit_model_deployment_completed() }}",
                        "transaction": true,
                        "index": null
                    },
                    {
                        "sql": "{{ grant_read_rights_to_role('data_engineer', ['all']) }}",
                        "transaction": true,
                        "index": null
                    },
                    {
                        "sql": "{{ grant_read_rights_to_role('analyst', ['all']) }}",
                        "transaction": true,
                        "index": null
                    }
                ],
                "pre-hook":
                [
                    {
                        "sql": "{{ audit_model_deployment_started() }}",
                        "transaction": true,
                        "index": null
                    }
                ]
            },
            "database": "analytics",
            "schema": "bl",
            "fqn":
            [
                "zivver_dwh",
                "business_layer",
                "commercial",
                "crm_account_lifecycle_stage_changes",
                "intermediates",
                "crm_account_became_customer_dates",
                "crm_account_became_customer_dates"
            ],
            "unique_id": "model.zivver_dwh.crm_account_became_customer_dates",
            "package_name": "zivver_dwh",
            "root_path": "C:\\Users\\tjebbe.bodewes\\Documents\\zivver-dwh\\dwh\\transformations",
            "path": "business_layer\\commercial\\crm_account_lifecycle_stage_changes\\intermediates\\crm_account_became_customer_dates\\crm_account_became_customer_dates.sql",
            "original_file_path": "models\\business_layer\\commercial\\crm_account_lifecycle_stage_changes\\intermediates\\crm_account_became_customer_dates\\crm_account_became_customer_dates.sql",
            "name": "crm_account_became_customer_dates",
            "alias": "crm_account_became_customer_dates",
            "checksum":
            {
                "name": "sha256",
                "checksum": "a037b5681219d90f8bf8d81641d3587f899501358664b8ec77168901b3e1808b"
            },
            "tags":
            [
                "business_layer",
                "commercial"
            ],
            "refs":
            [
                [
                    "stg_sfdc_accounts"
                ],
                [
                    "crm_opportunities"
                ],
                [
                    "crm_account_lifecycle_stage_changes_observed"
                ]
            ],
            "sources":
            [],
            "description": "",
            "columns":
            {
                "sfdc_account_id":
                {
                    "name": "sfdc_account_id",
                    "description": "",
                    "meta":
                    {},
                    "data_type": null,
                    "quote": null,
                    "tags":
                    []
                },
                "became_customer_at":
                {
                    "name": "became_customer_at",
                    "description": "",
                    "meta":
                    {},
                    "data_type": null,
                    "quote": null,
                    "tags":
                    []
                }
            },
            "meta":
            {},
            "docs":
            {
                "show": true
            },
            "patch_path": "zivver_dwh://models\\business_layer\\commercial\\crm_account_lifecycle_stage_changes\\intermediates\\crm_account_became_customer_dates\\crm_account_became_customer_dates.yml",
            "compiled_path": null,
            "build_path": null,
            "deferred": false,
            "unrendered_config":
            {
                "pre-hook":
                [
                    "{{ audit_model_deployment_started() }}"
                ],
                "post-hook":
                [
                    "{{ grant_read_rights_to_role('analyst', ['all']) }}"
                ],
                "tags":
                [
                    "commercial"
                ],
                "materialized": "ephemeral",
                "schema": "bl",
                "crm_record_types": null
            },
            "created_at": 1637233875
        }

So I have been looking at this, and it seems to be stemming from the way the manifest is parsed. The code implies the all_models variable is a list containing DbtModel objects however it appears that some DbtNode objects are making their way in there. DbtModel is what has the name property so that could be why this error is thrown, as the presence of that attribute is considered before failing.

The method below I think might need looking at to validate whether the all_models contains only DbtModel objects...

def parse_models(raw_manifest: dict, tag=None) -> List[models.DbtModel]:
    manifest = models.DbtManifest(**raw_manifest)
    all_models: List[models.DbtModel] = [
        node
        for node in manifest.nodes.values()
        if node.resource_type == 'model'
    ]


    # Empty model files have many missing parameters
    for model in all_models:
        if not hasattr(model, 'name'):
            logging.error('Cannot parse model with id: "%s" - is the model file empty?', model.unique_id)
            raise SystemExit('Failed')


    if tag is None:
        return all_models
    return [model for model in all_models if tags_match(tag, model)]

That's where my knowledge stops, so if anyone else has insight in to how the manifest object populates feel free to add that knowledge. Something must have changed between v0.7.0 and the latest to warrant this issue

Actually, I wonder if this has anything to do with the model in question being materialized as ephemeral... I would imagine you don't want lookml views to generated for ephemeral models, but v0.7.0 allows for that. In my case, I am getting the same error on ephemeral models...