DataRecce/recce

[DRC-336] [Bug] Unable to run Top-K diff due to BigQuery error

Closed this issue · 2 comments

Current Behavior

Attempting to run a Top-K diff returns a Error: local variable 'base' referenced before assignment error. This appears to be the case across all models and all columns, regardless of the data type:

image

Expected Behavior

Recce returns a Top-K diff analysis

Steps To Reproduce

  1. Create dbt artifacts and start the Recce server according to the docs
  2. Select a model in the Lineage tab
  3. Select "Advanced Diffs" > "Top-K Diff"
  4. Select any column from the drop down menu and click "Execute"
  5. See error

Relevant log output

INFO:     127.0.0.1:56276 - "GET /api/runs/db35bbf9-b821-4544-b390-7a56de61e94c/wait?timeout=2 HTTP/1.1" 200 OK
19:46:55  BigQuery adapter: https://console.cloud.google.com/bigquery?project=inq-warehouse&j=bq:US:4920f4a2-1873-4ed7-bd4a-008fc645d69a&page=queryresults
Failed to query the BASE top-k result Database Error
  Syntax error: Unexpected keyword UNNEST at [4:12]
Future exception was never retrieved
future: <Future finished exception=UnboundLocalError("local variable 'base' referenced before assignment")>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/workspaces/inquirer-dbt/.venv/lib/python3.9/site-packages/recce/apis/run_func.py", line 79, in fn
    raise e
  File "/workspaces/inquirer-dbt/.venv/lib/python3.9/site-packages/recce/apis/run_func.py", line 68, in fn
    result = task.execute()
  File "/workspaces/inquirer-dbt/.venv/lib/python3.9/site-packages/recce/tasks/top_k.py", line 116, in execute
    if base is not None:
UnboundLocalError: local variable 'base' referenced before assignment

Environment

  • recce: 0.10.0
  • OS: linux-gnu
  • Python: 3.9.17
  • Data Warehouse: BigQuery
  • dbt: 1.7.10 (BigQuery plugin 1.7.6)

Additional Context

Here is the generated BigQuery query that fails to run (it returns Syntax error: Unexpected keyword UNNEST at [4:12]):

/* {"app": "dbt", "dbt_version": "1.7.10", "profile_name": "inquirer-dbt", "target_name": "dev", "connection_name": "query"} */

WITH value_list AS (
    SELECT unnest(ARRAY['2024-02-28', '2024-02-27', '2024-03-06', '2024-02-29', '2024-03-05', '2024-03-04', '2024-03-19', '2024-03-07', '2024-03-18', '2024-03-01', '2024-03-20', '2024-03-13', '2024-03-21', '2024-03-12', '2024-03-14', '2024-03-25', '2024-03-11', '2024-03-15', '2024-03-08', '2024-03-22', '2024-03-02', '2024-03-03', '2024-03-17', '2024-03-09', '2024-03-23', '2024-03-16', '2024-03-10', '2024-03-26']) AS date, -- Replace these with your specific values
    unnest(ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]) AS order_seq
),
aggregated_data AS (
    SELECT
        date,
        COUNT(*) AS frequency
    FROM `inq-warehouse`.`analytics`.`article_performance_vestapol_team_desk`
    WHERE date IN ('2024-02-28', '2024-02-27', '2024-03-06', '2024-02-29', '2024-03-05', '2024-03-04', '2024-03-19', '2024-03-07', '2024-03-18', '2024-03-01', '2024-03-20', '2024-03-13', '2024-03-21', '2024-03-12', '2024-03-14', '2024-03-25', '2024-03-11', '2024-03-15', '2024-03-08', '2024-03-22', '2024-03-02', '2024-03-03', '2024-03-17', '2024-03-09', '2024-03-23', '2024-03-16', '2024-03-10', '2024-03-26') -- This ensures the aggregation only considers your values
    GROUP BY date
)

SELECT
    vl.date,
    COALESCE(ad.frequency, 0) AS frequency
FROM value_list vl
LEFT JOIN aggregated_data ad ON vl.date = ad.date
ORDER BY vl.order_seq

image

From SyncLinear.com | DRC-336

@waligob

Thanks for your feedback. We will change to use more compatible way to query the top categories.

This issue can be reproduced
image