Error when reading vector column of pgvector
np-kyokyo opened this issue · 8 comments
Things to check first
-
I have searched the existing issues and didn't find my bug already reported there
-
I have checked that my bug is still present in the latest release
Sqlacodegen version
3.0.0rc5
SQLAlchemy version
2.0.31
RDBMS vendor
PostgreSQL
What happened?
Issue:
When attempting to add a new column to the recommend_model."Image"
table using the following SQL command:
ALTER TABLE "recommend_model"."Image" ADD COLUMN "embedding" vector(1408) NOT NULL;
then, we encountered an issue where the pgvector.sqlalchemy.vector.VECTOR
type is not being properly recognized in the context of the get_adapted_type
method. This results in the column type falling back to sqlalchemy.sql.type_api.UserDefinedType
instead.
Using pgvector 0.3.0
Traceback (most recent call last):
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/bin/sqlacodegen", line 8, in <module>
sys.exit(main())
^^^^^^
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/cli.py", line 101, in main
outfile.write(generator.generate())
^^^^^^^^^^^^^^^^^^^^
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 171, in generate
self.fix_column_types(table)
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 651, in fix_column_types
column.type = self.get_adapted_type(column.type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 703, in get_adapted_type
if new_coltype.compile(self.bind.engine.dialect) != compiled_type and (
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/type_api.py", line 1062, in compile
return dialect.type_compiler_instance.process(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/compiler.py", line 960, in process
return type_._compiler_dispatch(self, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/visitors.py", line 141, in _compiler_dispatch
return meth(self, **kw) # type: ignore # noqa: E501
^^^^^^^^^^^^^^^^
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/compiler.py", line 7262, in visit_user_defined
return type_.get_col_spec(**kw)
^^^^^^^^^^^^^^^^^^
AttributeError: 'UserDefinedType' object has no attribute 'get_col_spec'
def get_adapted_type(self, coltype: Any) -> Any:
...
coltype = new_coltype
if supercls.__name__ != supercls.__name__.upper():
break
Analysis
The VECTOR type is recognized, but since it is an uppercase type, the loop continues to check for any camel case types. As a result, UserDefinedType is found, but since it does not have the get_col_spec method, an error occurs.
Potential Fixes
-
Avoid to use UserDefinedType when get adapted type:
if supercls.__visit_name__ == "user_defined": break coltype = new_coltype
UserDefinedTypeが発見された場合、すでにnew_coltypeがあった場合はそちらを優先する方針がよさそう。
It seems a good approach to prioritize an existing detected new_coltype if UserDefinedType is found.It seem to be below:
if supercls.__visit_name__ == "user_defined" and new_coltype is not None: continue ... adapted_coltype = new_coltype
-
Add Special Handling for
VECTOR
:
Include a specific check for theVECTOR
type:coltype = new_coltype if supercls.__name__ == "VECTOR": break
VECTORの時点で現状、キャメルケースがくることはない。→しかし、今後来るようになる可能性はある?
Currently, there are no camel case types coming after VECTOR. However, it is possible that they might come in the future. -
Remove the Uppercase Check:
Replace the existing check with a break statement, allowingVECTOR
to be recognized:coltype = new_coltype # if supercls.__name__ != supercls.__name__.upper(): # break
→ Due to the impact on other areas, this approach is rejected.
Output on success
embedding: Mapped[Any] = mapped_column(VECTOR(1408))
References
Commit where if supercls.name != supercls.name.upper() was introduced
ralated issue
What is the difference between Text
and TEXT
types in SQLAlchemy?
ref. https://docs.sqlalchemy.org/en/20/core/type_basics.html#generic-camelcase-types
Removing if supercls.__name__ != supercls.__name__.upper(): break
results in differences, such as parts previously outputting as Text
now appearing as TEXT
.
Database schema for reproducing the bug
(WIP)
I'm dealing with this issue... I've been too busy to handle it.
Did you install the project with the pgvector
extra?
@agronholm
yes. this is minimal requirements.lock to reproduce the issue.
# generated by rye
# use `rye lock` or `rye sync` to update this lockfile
#
# last locked with the following flags:
# pre: false
# features: []
# all-features: false
# with-sources: false
# generate-hashes: false
inflect==7.3.1
# via sqlacodegen
more-itertools==10.3.0
# via inflect
numpy==2.0.0
# via pgvector
pgvector==0.3.1
psycopg2==2.9.9
sqlacodegen==3.0.0rc5
sqlalchemy==2.0.31
# via sqlacodegen
typeguard==4.3.0
# via inflect
typing-extensions==4.12.2
# via sqlalchemy
# via typeguard
Others have successfully tested the pgvector integration. Any idea why it's not working for you?
I downgraded pgvector to 0.2.5 and it worked fine, was having the same issue before!
Pinpointing the specific commit where it starts to fail would probably help.