Error when reading vector column of pgvector

Question

Error when reading vector column of pgvector

np-kyokyo opened this issue 6 months ago · 8 comments

np-kyokyo commented 6 months ago

Things to check first

I have searched the existing issues and didn't find my bug already reported there
I have checked that my bug is still present in the latest release

Sqlacodegen version

3.0.0rc5

SQLAlchemy version

2.0.31

RDBMS vendor

PostgreSQL

What happened?

Issue:

When attempting to add a new column to the recommend_model."Image" table using the following SQL command:

ALTER TABLE "recommend_model"."Image" ADD COLUMN  "embedding" vector(1408) NOT NULL;

then, we encountered an issue where the pgvector.sqlalchemy.vector.VECTOR type is not being properly recognized in the context of the get_adapted_type method. This results in the column type falling back to sqlalchemy.sql.type_api.UserDefinedType instead.

Using pgvector 0.3.0
Traceback (most recent call last):
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/bin/sqlacodegen", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/cli.py", line 101, in main
    outfile.write(generator.generate())
                  ^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 171, in generate
    self.fix_column_types(table)
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 651, in fix_column_types
    column.type = self.get_adapted_type(column.type)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 703, in get_adapted_type
    if new_coltype.compile(self.bind.engine.dialect) != compiled_type and (
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/type_api.py", line 1062, in compile
    return dialect.type_compiler_instance.process(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/compiler.py", line 960, in process
    return type_._compiler_dispatch(self, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/visitors.py", line 141, in _compiler_dispatch
    return meth(self, **kw)  # type: ignore  # noqa: E501
           ^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/compiler.py", line 7262, in visit_user_defined
    return type_.get_col_spec(**kw)
           ^^^^^^^^^^^^^^^^^^
AttributeError: 'UserDefinedType' object has no attribute 'get_col_spec'

def get_adapted_type(self, coltype: Any) -> Any:
    ...
    coltype = new_coltype
    if supercls.__name__ != supercls.__name__.upper():
        break

Analysis

The VECTOR type is recognized, but since it is an uppercase type, the loop continues to check for any camel case types. As a result, UserDefinedType is found, but since it does not have the get_col_spec method, an error occurs.

Potential Fixes

Avoid to use UserDefinedType when get adapted type:

if supercls.__visit_name__ == "user_defined":
    break
coltype = new_coltype

UserDefinedTypeが発見された場合、すでにnew_coltypeがあった場合はそちらを優先する方針がよさそう。
It seems a good approach to prioritize an existing detected new_coltype if UserDefinedType is found.

It seem to be below:

             if supercls.__visit_name__ == "user_defined" and new_coltype is not None:
                 continue
       
             ...

             adapted_coltype = new_coltype

Add Special Handling for VECTOR:
Include a specific check for the VECTOR type:
```
coltype = new_coltype
if supercls.__name__ == "VECTOR":
    break
```
VECTORの時点で現状、キャメルケースがくることはない。→しかし、今後来るようになる可能性はある？
Currently, there are no camel case types coming after VECTOR. However, it is possible that they might come in the future.
Remove the Uppercase Check:
Replace the existing check with a break statement, allowing VECTOR to be recognized:
```
coltype = new_coltype
# if supercls.__name__ != supercls.__name__.upper():
#     break
```
→ Due to the impact on other areas, this approach is rejected.

Output on success

embedding: Mapped[Any] = mapped_column(VECTOR(1408))

References

Commit where if supercls.name != supercls.name.upper() was introduced

Commit 09ff6744ef6267484ab31d8c2e08e43801aac8b9

ralated issue

#300

What is the difference between `Text` and `TEXT` types in SQLAlchemy?

ref. https://docs.sqlalchemy.org/en/20/core/type_basics.html#generic-camelcase-types

Removing if supercls.__name__ != supercls.__name__.upper(): break results in differences, such as parts previously outputting as Text now appearing as TEXT.

Database schema for reproducing the bug

(WIP)

Answer 1 · 2024-07-12T03:18:09.000Z

~~I'm dealing with this issue...~~ I've been too busy to handle it.

Answer 2 · 2024-07-12T09:46:17.000Z

Did you install the project with the pgvector extra?

Answer 3 · 2024-07-12T09:57:09.000Z

@agronholm
yes. this is minimal requirements.lock to reproduce the issue.

# generated by rye
# use `rye lock` or `rye sync` to update this lockfile
#
# last locked with the following flags:
#   pre: false
#   features: []
#   all-features: false
#   with-sources: false
#   generate-hashes: false

inflect==7.3.1
    # via sqlacodegen
more-itertools==10.3.0
    # via inflect
numpy==2.0.0
    # via pgvector
pgvector==0.3.1
psycopg2==2.9.9
sqlacodegen==3.0.0rc5
sqlalchemy==2.0.31
    # via sqlacodegen
typeguard==4.3.0
    # via inflect
typing-extensions==4.12.2
    # via sqlalchemy
    # via typeguard

Answer 4 · 2024-07-13T14:10:56.000Z

Others have successfully tested the pgvector integration. Any idea why it's not working for you?

Answer 5 · 2024-07-18T22:28:42.000Z

I downgraded pgvector to 0.2.5 and it worked fine, was having the same issue before!

Answer 6 · 2024-07-18T22:31:46.000Z

Pinpointing the specific commit where it starts to fail would probably help.