feat(taps): Support declaring arbitrary SQLAlchemy type instances in SQL tap schemas
edgarrmondragon opened this issue · 1 comments
Feature scope
Tap/target metadata.
Description
Related to the original Singer spec's sql-datatype
(see #1323, #1903), this new metadata object would work in an analogous manner to Python's logging module capability for importing and instantiating arbitrary callables and classes.
The story, specifically for vector/embedding fields for LLMs, would go something like:
-
As a user, I know that one or more fields in a stream represent a vector. For example,
{"id": 1, "my_vector": [1, 2, 3]}
. -
I would like to use MeltanoLabs/target-postgres alongside pgvector/pgvector-python to declare the SQLAlchemy type of
my_vector
:pgvector.sqlalchemy.Vector
-
I would like not to only declare the type but also any arbitrary parameteres for it, like length, dimension, etc.:
# catalog metadata in Meltano syntax schema: my_stream: my_vector: sqlalchemy_type: (): "pgvector.sqlalchemy.Vector" dim: 3
This would trigger the SDK to import
pgvector.sqlalchemy.Vector
and instantiate it asVector(dim=3)
. -
The other requirement is that the user installs
pgvector-python
in the same virtual environment as target-postgres, which could be achieved with package extras, e.g.target-postgres[vector]
or by documenting known postgres SQLAlchemy extensions in the target's readme.
Potential issues
The biggest problem I can think off is figuring out the priority with which this type override is considered. For example, in target-postgres:
- a few JSON schema types are mapped first to postgres-specific column types, e.g. `int -> BIGINT
- then the SDK defaults are used
But by introducing this feature, we'd expect targets to resolve in the following order:
sqlalchemy_type
overrides- custom target implementation overrides
- SDK defaults
Dear Edgar,
thank you for converging this from MeltanoLabs/target-pinecone#20 so quickly. Is GH-1872 actually already setting the stage for your proposal, or is it something different?
With kind regards,
Andreas.