sqlglot.errors.ParseError for sample from website
samiabboud opened this issue · 22 comments
Hello everyone,
Running a copy/pasted sample from the site is raising a sqlglot.errors.ParseError
. Version issues maybe? Please see details below.
Your help is appreciated!
Cheers,
Sami
Environment details
- OS type and version: Sonoma 14.2.1 (on M2 Max)
- Python version:
python --version
3.9.18 - pip version:
pip --version
pip 23.0.1 bigframes
version:pip show bigframes
0.19.0
Steps to reproduce
- Run code sample from : https://cloud.google.com/bigquery/docs/bigquery-dataframes#bigframes-ml-regression after adding a project id
Code example
from bigframes.ml.linear_model import LinearRegression
import bigframes.pandas as bpd
bpd.options.bigquery.project = "our_project_id"
# Load data from BigQuery
query_or_table = "bigquery-public-data.ml_datasets.penguins"
bq_df = bpd.read_gbq(query_or_table)
# Filter down to the data to the Adelie Penguin species
adelie_data = bq_df[bq_df.species == "Adelie Penguin (Pygoscelis adeliae)"]
# Drop the species column
adelie_data = adelie_data.drop(columns=["species"])
# Drop rows with nulls to get training data
training_data = adelie_data.dropna()
# Specify your feature (or input) columns and the label (or output) column:
feature_columns = training_data[
["island", "culmen_length_mm", "culmen_depth_mm", "flipper_length_mm", "sex"]
]
label_columns = training_data[["body_mass_g"]]
test_data = adelie_data[adelie_data.body_mass_g.isnull()]
# Create the linear model
model = LinearRegression()
model.fit(feature_columns, label_columns)
# Score the model
score = model.score(feature_columns, label_columns)
# Predict using the model
result = model.predict(test_data)
# example
Stack trace
% python src/bq_run.py
Query job bb049054-a4f0-4d88-b128-b97eb020038b is DONE.28.9 kB processed.
https://console.cloud.google.com/bigquery?project=platform-dev-285607&j=bq:US:bb049054-a4f0-4d88-b128-b97eb020038b&page=queryresults
Traceback (most recent call last):
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/parser.py", line 1039, in parse_into
return self._parse(parser, raw_tokens, sql)
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/parser.py", line 1078, in _parse
self.raise_error("Invalid expression / Unexpected token")
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/parser.py", line 1119, in raise_error
raise error
sqlglot.errors.ParseError: Invalid expression / Unexpected token. Line 1, Col: 61.
platform-dev-285607._21e83bdd53455fdc8544000e45591de500adacc2.anon0277dfb0_f1fc_47b2_a519_1493d286435f
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/samiabboud/dev/aampe/modeling/src/bq_run.py", line 33, in <module>
model.fit(feature_columns, label_columns)
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/ml/base.py", line 162, in fit
return self._fit(X, y)
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
return method(*args, **kwargs)
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/ml/linear_model.py", line 136, in _fit
self._bqml_model = self._bqml_model_factory.create_model(
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/ml/core.py", line 245, in create_model
input_data = X_train._cached().join(y_train._cached(), how="outer")
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
return method(*args, **kwargs)
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/dataframe.py", line 3045, in _cached
self._set_block(self._block.cached())
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/core/blocks.py", line 1677, in cached
self.session._execute_and_cache(self.expr, cluster_cols=self.index_columns),
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/session/__init__.py", line 1479, in _execute_and_cache
table_expression = self.ibis_client.table(
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/ibis/backends/bigquery/__init__.py", line 509, in table
table = sg.parse_one(name, into=sg.exp.Table, read=self.name)
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/__init__.py", line 124, in parse_one
result = dialect.parse_into(into, sql, **opts)
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/dialects/dialect.py", line 325, in parse_into
return self.parser(**opts).parse_into(expression_type, self.tokenize(sql), sql)
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/parser.py", line 1044, in parse_into
raise ParseError(
sqlglot.errors.ParseError: Failed to parse 'platform-dev-285607._21e83bdd53455fdc8544000e45591de500adacc2.anon0277dfb0_f1fc_47b2_a519_1493d286435f' into <class 'sqlglot.expressions.Table'>
Hi @samiabboud, could you help check which version of ibis-framework
and sqlglot
are you using? It might be ibis forgets to add the ``` backticks to escape names. It's possible when you were using an older version of these.
Thank you for the quick response @ashleyxuu . Here are the versions I have installed (as part of bigframes latest I believe).
% pip show sqlglot
Name: sqlglot
Version: 19.9.0
Summary: An easily customizable SQL parser and transpiler
Home-page: https://github.com/tobymao/sqlglot
Author: Toby Mao
Author-email: toby.mao@gmail.com
License: MIT
Location: /Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages
Requires:
Required-by: ibis-framework
(venv) samiabboud@macpro modeling % pip show ibis-framework
Name: ibis-framework
Version: 7.1.0
Summary: Productivity-centric Python Big Data Framework
Home-page: https://ibis-project.org
Author: Ibis Maintainers
Author-email: maintainers@ibis-project.org
License: Apache-2.0
Location: /Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages
Requires: atpublic, bidict, filelock, multipledispatch, numpy, pandas, parsy, pins, pyarrow, pyarrow-hotfix, python-dateutil, pytz, rich, sqlglot, toolz, typing-extensions
Required-by: bigframes
Upgrading sqlglot to 20.8.0 doesn't resolve the error.
Upgrading ibis-framework results in the following dependency error:
bigframes 0.19.0 requires ibis-framework[bigquery]<7.2.0dev,>=7.1.0, but you have ibis-framework 7.2.0 which is incompatible.
+1
Could you try upgrading bigframes to the latest version 0.19.2
and see if you can repro the problem?
(venv) $ pip show sqlglot
Name: sqlglot
Version: 20.4.0
Summary: An easily customizable SQL parser and transpiler
Home-page: https://github.com/tobymao/sqlglot
Author: Toby Mao
Author-email: toby.mao@gmail.com
License: MIT
Location: /usr/local/google/home/ashleyxu/src/python-bigquery-dataframes/venv/lib/python3.10/site-packages
Requires:
Required-by: ibis-framework
(venv) $ pip show ibis-framework
Name: ibis-framework
Version: 7.2.0
Summary: Productivity-centric Python Big Data Framework
Home-page: https://ibis-project.org
Author: Ibis Maintainers
Author-email: maintainers@ibis-project.org
License: Apache-2.0
Location: /usr/local/google/home/ashleyxu/src/python-bigquery-dataframes/venv/lib/python3.10/site-packages
Requires: atpublic, bidict, filelock, multipledispatch, numpy, pandas, parsy, pins, pyarrow, pyarrow-hotfix, python-dateutil, pytz, rich, sqlglot, toolz, typing-extensions
Required-by: bigframes
I already updated these modules but it was not resolved.
https://partner.cloudskillsboost.google/course_sessions/11372701/labs/448513 same issue in this cloudskill lab with colab enterprise
+1
+1
+1
For ones who're facing this issue, please try re-installing the package with pip, and restart your runtime and try running the notebook again by following these steps:
- Expand the menu.
- Select Runtime
- Select Restart session
If this still doesn't work for you, you can try creating a brand new project. (The issue might be related to permissions or cross region)
We're working on fixing it. Internal issue number: 326126888
Hi, @ashleyxuu
Code:
from bigframes.ml.preprocessing import OneHotEncoder
enc = OneHotEncoder()
X = bf.DataFrame({"a": ["Male", "Female", "Female"], "b": ["1", "3", "2"]})
enc.fit(X)
print(enc.transform(bf.DataFrame({"a": ["Female", "Male"], "b": ["1", "4"]})))
Error:
ParseError: Invalid expression / Unexpected token.
The above exception was the direct cause of the following exception:
ParseError Traceback (most recent call last)
Cell In[7], line 5
3 enc = OneHotEncoder()
4 X = bf.DataFrame({"a": ["Male", "Female", "Female"], "b": ["1", "3", "2"]})
----> 5 enc.fit(X)
6 print(enc.transform(bf.DataFrame({"a": ["Female", "Male"], "b": ["1", "4"]})))
I am waiting for fix. Thanks.
@uysalfurkan Thank you so much for reporting this issue! I apologize for the delay in responding. To help me track down this dependency mismatch, could you please provide the full call stack? Also, to reproduce the error in our side, would you mind sharing your environment details? You can generate these using the following command:
import sys
!{sys.executable} -m pip freeze
In addition to #315 (comment) could you also please run the following cell:
import bigframes
bigframes.__version__
This will help us confirm that the version of bigframes in the notebook aligns with that claimed by pip
.
@chelsea-lin , @tswast sorry for delayed response.
Environment details:
environment.txt
Full call stack:
Traceback (most recent call last):
File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/parser.py", line 1056, in parse_into
return self._parse(parser, raw_tokens, sql)
File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/parser.py", line 1095, in _parse
self.raise_error("Invalid expression / Unexpected token")
File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/parser.py", line 1136, in raise_error
raise error
sqlglot.errors.ParseError: Invalid expression / Unexpected token. Line 1, Col: 64.
encoded-hangout-414110._46f61dc8b3e2eb2697eb7be8fa45757c2d44aebe.anon271366cfceeb965c764bc43446c057e69691f83157ca78394ecb85df7904eb22
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/folders/wn/fkylpmg57j11htwlk6n249xr0000gp/T/ipykernel_37862/2695306357.py", line 5, in <module>
enc.fit(X)
File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
return method(*args, **kwargs)
File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/ml/preprocessing.py", line 510, in fit
self._bqml_model = self._bqml_model_factory.create_model(
File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/ml/core.py", line 243, in create_model
input_data = X_train._cached()
File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
return method(*args, **kwargs)
File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/dataframe.py", line 3045, in _cached
self._set_block(self._block.cached())
File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/core/blocks.py", line 1677, in cached
self.session._execute_and_cache(self.expr, cluster_cols=self.index_columns),
File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/session/__init__.py", line 1479, in _execute_and_cache
table_expression = self.ibis_client.table(
File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/ibis/backends/bigquery/__init__.py", line 509, in table
table = sg.parse_one(name, into=sg.exp.Table, read=self.name)
File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/__init__.py", line 123, in parse_one
result = dialect.parse_into(into, sql, **opts)
File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/dialects/dialect.py", line 447, in parse_into
return self.parser(**opts).parse_into(expression_type, self.tokenize(sql), sql)
File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/parser.py", line 1061, in parse_into
raise ParseError(
sqlglot.errors.ParseError: Failed to parse 'encoded-hangout-414110._46f61dc8b3e2eb2697eb7be8fa45757c2d44aebe.anon271366cfceeb965c764bc43446c057e69691f83157ca78394ecb85df7904eb22' into <class 'sqlglot.expressions.Table'>
Re-opening, as it appears this issue occurs even with latest supported sqlglot.
A thought: ibis 7.x and 8.x only use sqlglot in the BigQuery backend for unnest support. This isn't currently used by BigQuery DataFrames. We might be able to monkeypatch out the problematic code to avoid this.
I'm having a similar issue even with basic operations like bq_df.head(). This is all taking place in the BigQuery notebook interface with the default runtime. Here's the full call stack:
ParseError Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/sqlglot/parser.py](https://localhost:8080/#) in parse_into(self, expression_types, raw_tokens, sql)
1038 try:
-> 1039 return self._parse(parser, raw_tokens, sql)
1040 except ParseError as e:
14 frames
ParseError: Invalid expression / Unexpected token. Line 1, Col: 60.
TABLE_REDACTED
The above exception was the direct cause of the following exception:
ParseError Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/sqlglot/parser.py](https://localhost:8080/#) in parse_into(self, expression_types, raw_tokens, sql)
1042 errors.append(e)
1043
-> 1044 raise ParseError(
1045 f"Failed to parse '{sql or raw_tokens}' into {expression_types}",
1046 errors=merge_errors(errors),
ParseError: Failed to parse 'TABLE_REDACTED' into <class 'sqlglot.expressions.Table'>```
import bigframes
bigframes.__version__
0.21.0
import sqlglot
sqlglot.__version__
19.9.0
@ZeroCool2u Thanks for the report. My teammate @chelsea-lin was able to determine that there is a bug in sqlglot's parsing of BigQuery table IDs, which has been reported and hopefully fixed in a future release. In the meantime, I believe bigframes 0.22.0 will have worked around this issue. Could you please try with that version and report back if it is fixed?
In a notebook:
%pip install --upgrade bigframes
And then restart your notebook runtime.
@tswast @chelsea-lin Just tested and seems fixed! Thank you!!