[DRC-336] [Bug] Unable to run Top-K diff due to BigQuery error
Closed this issue · 2 comments
waligob commented
Current Behavior
Attempting to run a Top-K diff returns a Error: local variable 'base' referenced before assignment
error. This appears to be the case across all models and all columns, regardless of the data type:
Expected Behavior
Recce returns a Top-K diff analysis
Steps To Reproduce
- Create dbt artifacts and start the Recce server according to the docs
- Select a model in the Lineage tab
- Select "Advanced Diffs" > "Top-K Diff"
- Select any column from the drop down menu and click "Execute"
- See error
Relevant log output
INFO: 127.0.0.1:56276 - "GET /api/runs/db35bbf9-b821-4544-b390-7a56de61e94c/wait?timeout=2 HTTP/1.1" 200 OK
19:46:55 BigQuery adapter: https://console.cloud.google.com/bigquery?project=inq-warehouse&j=bq:US:4920f4a2-1873-4ed7-bd4a-008fc645d69a&page=queryresults
Failed to query the BASE top-k result Database Error
Syntax error: Unexpected keyword UNNEST at [4:12]
Future exception was never retrieved
future: <Future finished exception=UnboundLocalError("local variable 'base' referenced before assignment")>
Traceback (most recent call last):
File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/workspaces/inquirer-dbt/.venv/lib/python3.9/site-packages/recce/apis/run_func.py", line 79, in fn
raise e
File "/workspaces/inquirer-dbt/.venv/lib/python3.9/site-packages/recce/apis/run_func.py", line 68, in fn
result = task.execute()
File "/workspaces/inquirer-dbt/.venv/lib/python3.9/site-packages/recce/tasks/top_k.py", line 116, in execute
if base is not None:
UnboundLocalError: local variable 'base' referenced before assignment
Environment
- recce:
0.10.0
- OS:
linux-gnu
- Python:
3.9.17
- Data Warehouse: BigQuery
- dbt:
1.7.10
(BigQuery plugin1.7.6
)
Additional Context
Here is the generated BigQuery query that fails to run (it returns Syntax error: Unexpected keyword UNNEST at [4:12]
):
/* {"app": "dbt", "dbt_version": "1.7.10", "profile_name": "inquirer-dbt", "target_name": "dev", "connection_name": "query"} */
WITH value_list AS (
SELECT unnest(ARRAY['2024-02-28', '2024-02-27', '2024-03-06', '2024-02-29', '2024-03-05', '2024-03-04', '2024-03-19', '2024-03-07', '2024-03-18', '2024-03-01', '2024-03-20', '2024-03-13', '2024-03-21', '2024-03-12', '2024-03-14', '2024-03-25', '2024-03-11', '2024-03-15', '2024-03-08', '2024-03-22', '2024-03-02', '2024-03-03', '2024-03-17', '2024-03-09', '2024-03-23', '2024-03-16', '2024-03-10', '2024-03-26']) AS date, -- Replace these with your specific values
unnest(ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]) AS order_seq
),
aggregated_data AS (
SELECT
date,
COUNT(*) AS frequency
FROM `inq-warehouse`.`analytics`.`article_performance_vestapol_team_desk`
WHERE date IN ('2024-02-28', '2024-02-27', '2024-03-06', '2024-02-29', '2024-03-05', '2024-03-04', '2024-03-19', '2024-03-07', '2024-03-18', '2024-03-01', '2024-03-20', '2024-03-13', '2024-03-21', '2024-03-12', '2024-03-14', '2024-03-25', '2024-03-11', '2024-03-15', '2024-03-08', '2024-03-22', '2024-03-02', '2024-03-03', '2024-03-17', '2024-03-09', '2024-03-23', '2024-03-16', '2024-03-10', '2024-03-26') -- This ensures the aggregation only considers your values
GROUP BY date
)
SELECT
vl.date,
COALESCE(ad.frequency, 0) AS frequency
FROM value_list vl
LEFT JOIN aggregated_data ad ON vl.date = ad.date
ORDER BY vl.order_seq
From SyncLinear.com | DRC-336
popcornylu commented
Thanks for your feedback. We will change to use more compatible way to query the top categories.
popcornylu commented