agronholm/sqlacodegen

Enhancement: Add Support for pgvector extension

KellyRousselHoomano opened this issue · 2 comments

Things to check first

  • I have searched the existing issues and didn't find my feature already requested there

Feature description

I propose enhancing sqlacodegen to include native support for the pgvector extension. Currently, the tool does not recognize the 'Vector' type of the pgvector extension. To address this, I have forked the repository and created a dedicated branch (feature-pgvector).

In this branch, I followed a similar process used for previous extensions such as "citext" or "geoalchemy2" to enable support for the pgvector extension. The modifications made to the codebase can be reviewed in the branch, and I have verified that pgvector is correctly installed when using the command:

pip install git+https://github.com/hoomano/sqlacodegen.git@feature-pgvector#egg=sqlacodegen\[pgvector\]

However, despite successful installation, running the sqlacodegen command line to export PostgreSQL database models results in the following warning:

sqlacodegen/cli.py:81: SAWarning: Did not recognize type 'vector' of column 'embedding'
  metadata.reflect(engine, schema, not args.noviews, tables)

Use case

The need for pgvector support in sqlacodegen arises from the growing adoption of Large Language Models (LLMs) and the desire to implement a retrieval tool using pgvector for efficient handling of embeddings in a PostgreSQL database. Retrieval databases, in this context, seem overkill for some use cases.

By adding native support for the pgvector extension, sqlacodegen would empower users to seamlessly integrate their PostgreSQL databases with pgvector, leveraging its capabilities such as cosine distance metrics for retrieval purposes.

This feature not only addresses our immediate requirements but also extends the utility of sqlacodegen to a broader audience engaged in similar use cases involving advanced data types like pgvector.

Your collaboration and insights on this feature request are highly appreciated. 😁

Would you create a PR for this?

Sure ! Here it is: #301