agronholm/sqlacodegen

Error when reading vector column of pgvector

np-kyokyo opened this issue · 8 comments

Things to check first

  • I have searched the existing issues and didn't find my bug already reported there

  • I have checked that my bug is still present in the latest release

Sqlacodegen version

3.0.0rc5

SQLAlchemy version

2.0.31

RDBMS vendor

PostgreSQL

What happened?

Issue:

When attempting to add a new column to the recommend_model."Image" table using the following SQL command:

ALTER TABLE "recommend_model"."Image" ADD COLUMN  "embedding" vector(1408) NOT NULL;

then, we encountered an issue where the pgvector.sqlalchemy.vector.VECTOR type is not being properly recognized in the context of the get_adapted_type method. This results in the column type falling back to sqlalchemy.sql.type_api.UserDefinedType instead.

Using pgvector 0.3.0
Traceback (most recent call last):
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/bin/sqlacodegen", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/cli.py", line 101, in main
    outfile.write(generator.generate())
                  ^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 171, in generate
    self.fix_column_types(table)
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 651, in fix_column_types
    column.type = self.get_adapted_type(column.type)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 703, in get_adapted_type
    if new_coltype.compile(self.bind.engine.dialect) != compiled_type and (
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/type_api.py", line 1062, in compile
    return dialect.type_compiler_instance.process(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/compiler.py", line 960, in process
    return type_._compiler_dispatch(self, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/visitors.py", line 141, in _compiler_dispatch
    return meth(self, **kw)  # type: ignore  # noqa: E501
           ^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/compiler.py", line 7262, in visit_user_defined
    return type_.get_col_spec(**kw)
           ^^^^^^^^^^^^^^^^^^
AttributeError: 'UserDefinedType' object has no attribute 'get_col_spec'
def get_adapted_type(self, coltype: Any) -> Any:
    ...
    coltype = new_coltype
    if supercls.__name__ != supercls.__name__.upper():
        break

Analysis

The VECTOR type is recognized, but since it is an uppercase type, the loop continues to check for any camel case types. As a result, UserDefinedType is found, but since it does not have the get_col_spec method, an error occurs.

Potential Fixes

  1. Avoid to use UserDefinedType when get adapted type:

    if supercls.__visit_name__ == "user_defined":
        break
    coltype = new_coltype

    UserDefinedTypeが発見された場合、すでにnew_coltypeがあった場合はそちらを優先する方針がよさそう。
    It seems a good approach to prioritize an existing detected new_coltype if UserDefinedType is found.

    It seem to be below:

                 if supercls.__visit_name__ == "user_defined" and new_coltype is not None:
                     continue
           
                 ...
    
                 adapted_coltype = new_coltype
  2. Add Special Handling for VECTOR:
    Include a specific check for the VECTOR type:

    coltype = new_coltype
    if supercls.__name__ == "VECTOR":
        break

    VECTORの時点で現状、キャメルケースがくることはない。→しかし、今後来るようになる可能性はある?
    Currently, there are no camel case types coming after VECTOR. However, it is possible that they might come in the future.

  3. Remove the Uppercase Check:
    Replace the existing check with a break statement, allowing VECTOR to be recognized:

    coltype = new_coltype
    # if supercls.__name__ != supercls.__name__.upper():
    #     break

    → Due to the impact on other areas, this approach is rejected.

Output on success

embedding: Mapped[Any] = mapped_column(VECTOR(1408))

References

Commit where if supercls.name != supercls.name.upper() was introduced

ralated issue

#300

What is the difference between Text and TEXT types in SQLAlchemy?

ref. https://docs.sqlalchemy.org/en/20/core/type_basics.html#generic-camelcase-types

Removing if supercls.__name__ != supercls.__name__.upper(): break results in differences, such as parts previously outputting as Text now appearing as TEXT.

Database schema for reproducing the bug

(WIP)

I'm dealing with this issue... I've been too busy to handle it.

Did you install the project with the pgvector extra?

@agronholm
yes. this is minimal requirements.lock to reproduce the issue.

# generated by rye
# use `rye lock` or `rye sync` to update this lockfile
#
# last locked with the following flags:
#   pre: false
#   features: []
#   all-features: false
#   with-sources: false
#   generate-hashes: false

inflect==7.3.1
    # via sqlacodegen
more-itertools==10.3.0
    # via inflect
numpy==2.0.0
    # via pgvector
pgvector==0.3.1
psycopg2==2.9.9
sqlacodegen==3.0.0rc5
sqlalchemy==2.0.31
    # via sqlacodegen
typeguard==4.3.0
    # via inflect
typing-extensions==4.12.2
    # via sqlalchemy
    # via typeguard

Others have successfully tested the pgvector integration. Any idea why it's not working for you?

I downgraded pgvector to 0.2.5 and it worked fine, was having the same issue before!

Pinpointing the specific commit where it starts to fail would probably help.