googleapis/python-bigquery-dataframes

sqlglot.errors.ParseError for sample from website

samiabboud opened this issue · 22 comments

Hello everyone,

Running a copy/pasted sample from the site is raising a sqlglot.errors.ParseError. Version issues maybe? Please see details below.

Your help is appreciated!

Cheers,
Sami

Environment details

  • OS type and version: Sonoma 14.2.1 (on M2 Max)
  • Python version: python --version 3.9.18
  • pip version: pip --version pip 23.0.1
  • bigframes version: pip show bigframes 0.19.0

Steps to reproduce

  1. Run code sample from : https://cloud.google.com/bigquery/docs/bigquery-dataframes#bigframes-ml-regression after adding a project id

Code example

from bigframes.ml.linear_model import LinearRegression
import bigframes.pandas as bpd

bpd.options.bigquery.project = "our_project_id"

# Load data from BigQuery
query_or_table = "bigquery-public-data.ml_datasets.penguins"
bq_df = bpd.read_gbq(query_or_table)

# Filter down to the data to the Adelie Penguin species
adelie_data = bq_df[bq_df.species == "Adelie Penguin (Pygoscelis adeliae)"]

# Drop the species column
adelie_data = adelie_data.drop(columns=["species"])

# Drop rows with nulls to get training data
training_data = adelie_data.dropna()

# Specify your feature (or input) columns and the label (or output) column:
feature_columns = training_data[
    ["island", "culmen_length_mm", "culmen_depth_mm", "flipper_length_mm", "sex"]
]
label_columns = training_data[["body_mass_g"]]

test_data = adelie_data[adelie_data.body_mass_g.isnull()]

# Create the linear model
model = LinearRegression()
model.fit(feature_columns, label_columns)

# Score the model
score = model.score(feature_columns, label_columns)

# Predict using the model
result = model.predict(test_data)
# example

Stack trace

% python src/bq_run.py
Query job bb049054-a4f0-4d88-b128-b97eb020038b is DONE.28.9 kB processed.  
https://console.cloud.google.com/bigquery?project=platform-dev-285607&j=bq:US:bb049054-a4f0-4d88-b128-b97eb020038b&page=queryresults
Traceback (most recent call last):
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/parser.py", line 1039, in parse_into
    return self._parse(parser, raw_tokens, sql)
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/parser.py", line 1078, in _parse
    self.raise_error("Invalid expression / Unexpected token")
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/parser.py", line 1119, in raise_error
    raise error
sqlglot.errors.ParseError: Invalid expression / Unexpected token. Line 1, Col: 61.
  platform-dev-285607._21e83bdd53455fdc8544000e45591de500adacc2.anon0277dfb0_f1fc_47b2_a519_1493d286435f

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/samiabboud/dev/aampe/modeling/src/bq_run.py", line 33, in <module>
    model.fit(feature_columns, label_columns)
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/ml/base.py", line 162, in fit
    return self._fit(X, y)
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
    return method(*args, **kwargs)
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/ml/linear_model.py", line 136, in _fit
    self._bqml_model = self._bqml_model_factory.create_model(
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/ml/core.py", line 245, in create_model
    input_data = X_train._cached().join(y_train._cached(), how="outer")
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
    return method(*args, **kwargs)
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/dataframe.py", line 3045, in _cached
    self._set_block(self._block.cached())
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/core/blocks.py", line 1677, in cached
    self.session._execute_and_cache(self.expr, cluster_cols=self.index_columns),
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/session/__init__.py", line 1479, in _execute_and_cache
    table_expression = self.ibis_client.table(
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/ibis/backends/bigquery/__init__.py", line 509, in table
    table = sg.parse_one(name, into=sg.exp.Table, read=self.name)
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/__init__.py", line 124, in parse_one
    result = dialect.parse_into(into, sql, **opts)
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/dialects/dialect.py", line 325, in parse_into
    return self.parser(**opts).parse_into(expression_type, self.tokenize(sql), sql)
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/parser.py", line 1044, in parse_into
    raise ParseError(
sqlglot.errors.ParseError: Failed to parse 'platform-dev-285607._21e83bdd53455fdc8544000e45591de500adacc2.anon0277dfb0_f1fc_47b2_a519_1493d286435f' into <class 'sqlglot.expressions.Table'>

Hi @samiabboud, could you help check which version of ibis-framework and sqlglot are you using? It might be ibis forgets to add the ``` backticks to escape names. It's possible when you were using an older version of these.

Thank you for the quick response @ashleyxuu . Here are the versions I have installed (as part of bigframes latest I believe).

% pip show sqlglot
Name: sqlglot
Version: 19.9.0
Summary: An easily customizable SQL parser and transpiler
Home-page: https://github.com/tobymao/sqlglot
Author: Toby Mao
Author-email: toby.mao@gmail.com
License: MIT
Location: /Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages
Requires: 
Required-by: ibis-framework
(venv) samiabboud@macpro modeling % pip show ibis-framework
Name: ibis-framework
Version: 7.1.0
Summary: Productivity-centric Python Big Data Framework
Home-page: https://ibis-project.org
Author: Ibis Maintainers
Author-email: maintainers@ibis-project.org
License: Apache-2.0
Location: /Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages
Requires: atpublic, bidict, filelock, multipledispatch, numpy, pandas, parsy, pins, pyarrow, pyarrow-hotfix, python-dateutil, pytz, rich, sqlglot, toolz, typing-extensions
Required-by: bigframes

Upgrading sqlglot to 20.8.0 doesn't resolve the error.

Upgrading ibis-framework results in the following dependency error:

bigframes 0.19.0 requires ibis-framework[bigquery]<7.2.0dev,>=7.1.0, but you have ibis-framework 7.2.0 which is incompatible.

+1

Could you try upgrading bigframes to the latest version 0.19.2 and see if you can repro the problem?

(venv) $ pip show sqlglot
Name: sqlglot
Version: 20.4.0
Summary: An easily customizable SQL parser and transpiler
Home-page: https://github.com/tobymao/sqlglot
Author: Toby Mao
Author-email: toby.mao@gmail.com
License: MIT
Location: /usr/local/google/home/ashleyxu/src/python-bigquery-dataframes/venv/lib/python3.10/site-packages
Requires: 
Required-by: ibis-framework
(venv) $ pip show ibis-framework
Name: ibis-framework
Version: 7.2.0
Summary: Productivity-centric Python Big Data Framework
Home-page: https://ibis-project.org
Author: Ibis Maintainers
Author-email: maintainers@ibis-project.org
License: Apache-2.0
Location: /usr/local/google/home/ashleyxu/src/python-bigquery-dataframes/venv/lib/python3.10/site-packages
Requires: atpublic, bidict, filelock, multipledispatch, numpy, pandas, parsy, pins, pyarrow, pyarrow-hotfix, python-dateutil, pytz, rich, sqlglot, toolz, typing-extensions
Required-by: bigframes

I already updated these modules but it was not resolved.

https://partner.cloudskillsboost.google/course_sessions/11372701/labs/448513 same issue in this cloudskill lab with colab enterprise

For ones who're facing this issue, please try re-installing the package with pip, and restart your runtime and try running the notebook again by following these steps:

  1. Expand the menu.
  2. Select Runtime
  3. Select Restart session
Screenshot 2024-02-20 at 10 00 42 AM

If this still doesn't work for you, you can try creating a brand new project. (The issue might be related to permissions or cross region)

We're working on fixing it. Internal issue number: 326126888

Hi, @ashleyxuu

Code:

from bigframes.ml.preprocessing import OneHotEncoder

enc = OneHotEncoder()
X = bf.DataFrame({"a": ["Male", "Female", "Female"], "b": ["1", "3", "2"]})
enc.fit(X)
print(enc.transform(bf.DataFrame({"a": ["Female", "Male"], "b": ["1", "4"]})))

Error:

ParseError: Invalid expression / Unexpected token.
The above exception was the direct cause of the following exception:

ParseError                                Traceback (most recent call last)
Cell In[7], line 5
      3 enc = OneHotEncoder()
      4 X = bf.DataFrame({"a": ["Male", "Female", "Female"], "b": ["1", "3", "2"]})
----> 5 enc.fit(X)
      6 print(enc.transform(bf.DataFrame({"a": ["Female", "Male"], "b": ["1", "4"]})))

I am waiting for fix. Thanks.

@uysalfurkan Thank you so much for reporting this issue! I apologize for the delay in responding. To help me track down this dependency mismatch, could you please provide the full call stack? Also, to reproduce the error in our side, would you mind sharing your environment details? You can generate these using the following command:

import sys
!{sys.executable} -m pip freeze

In addition to #315 (comment) could you also please run the following cell:

import bigframes
bigframes.__version__

This will help us confirm that the version of bigframes in the notebook aligns with that claimed by pip.

@chelsea-lin , @tswast sorry for delayed response.

Environment details:
environment.txt

Full call stack:

Traceback (most recent call last):
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/parser.py", line 1056, in parse_into
    return self._parse(parser, raw_tokens, sql)
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/parser.py", line 1095, in _parse
    self.raise_error("Invalid expression / Unexpected token")
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/parser.py", line 1136, in raise_error
    raise error
sqlglot.errors.ParseError: Invalid expression / Unexpected token. Line 1, Col: 64.
  encoded-hangout-414110._46f61dc8b3e2eb2697eb7be8fa45757c2d44aebe.anon271366cfceeb965c764bc43446c057e69691f83157ca78394ecb85df7904eb22

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/folders/wn/fkylpmg57j11htwlk6n249xr0000gp/T/ipykernel_37862/2695306357.py", line 5, in <module>
    enc.fit(X)
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
    return method(*args, **kwargs)
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/ml/preprocessing.py", line 510, in fit
    self._bqml_model = self._bqml_model_factory.create_model(
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/ml/core.py", line 243, in create_model
    input_data = X_train._cached()
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
    return method(*args, **kwargs)
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/dataframe.py", line 3045, in _cached
    self._set_block(self._block.cached())
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/core/blocks.py", line 1677, in cached
    self.session._execute_and_cache(self.expr, cluster_cols=self.index_columns),
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/session/__init__.py", line 1479, in _execute_and_cache
    table_expression = self.ibis_client.table(
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/ibis/backends/bigquery/__init__.py", line 509, in table
    table = sg.parse_one(name, into=sg.exp.Table, read=self.name)
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/__init__.py", line 123, in parse_one
    result = dialect.parse_into(into, sql, **opts)
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/dialects/dialect.py", line 447, in parse_into
    return self.parser(**opts).parse_into(expression_type, self.tokenize(sql), sql)
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/parser.py", line 1061, in parse_into
    raise ParseError(
sqlglot.errors.ParseError: Failed to parse 'encoded-hangout-414110._46f61dc8b3e2eb2697eb7be8fa45757c2d44aebe.anon271366cfceeb965c764bc43446c057e69691f83157ca78394ecb85df7904eb22' into <class 'sqlglot.expressions.Table'>

Re-opening, as it appears this issue occurs even with latest supported sqlglot.

A thought: ibis 7.x and 8.x only use sqlglot in the BigQuery backend for unnest support. This isn't currently used by BigQuery DataFrames. We might be able to monkeypatch out the problematic code to avoid this.

I'd also like to try updating to ibis 8.x in #277 to see if that fixes this issue.

I'm having a similar issue even with basic operations like bq_df.head(). This is all taking place in the BigQuery notebook interface with the default runtime. Here's the full call stack:

ParseError                                Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/sqlglot/parser.py](https://localhost:8080/#) in parse_into(self, expression_types, raw_tokens, sql)
   1038             try:
-> 1039                 return self._parse(parser, raw_tokens, sql)
   1040             except ParseError as e:

14 frames
ParseError: Invalid expression / Unexpected token. Line 1, Col: 60.
  TABLE_REDACTED

The above exception was the direct cause of the following exception:

ParseError                                Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/sqlglot/parser.py](https://localhost:8080/#) in parse_into(self, expression_types, raw_tokens, sql)
   1042                 errors.append(e)
   1043 
-> 1044         raise ParseError(
   1045             f"Failed to parse '{sql or raw_tokens}' into {expression_types}",
   1046             errors=merge_errors(errors),

ParseError: Failed to parse 'TABLE_REDACTED' into <class 'sqlglot.expressions.Table'>```


import bigframes
bigframes.__version__

0.21.0

import sqlglot
sqlglot.__version__

19.9.0

@ZeroCool2u Thanks for the report. My teammate @chelsea-lin was able to determine that there is a bug in sqlglot's parsing of BigQuery table IDs, which has been reported and hopefully fixed in a future release. In the meantime, I believe bigframes 0.22.0 will have worked around this issue. Could you please try with that version and report back if it is fixed?

In a notebook:

%pip install --upgrade bigframes

And then restart your notebook runtime.

@tswast @chelsea-lin Just tested and seems fixed! Thank you!!