meltano/sdk

feat(taps): Support declaring arbitrary SQLAlchemy type instances in SQL tap schemas

edgarrmondragon opened this issue · 1 comments

Feature scope

Tap/target metadata.

Description

Related to the original Singer spec's sql-datatype (see #1323, #1903), this new metadata object would work in an analogous manner to Python's logging module capability for importing and instantiating arbitrary callables and classes.

The story, specifically for vector/embedding fields for LLMs, would go something like:

  1. As a user, I know that one or more fields in a stream represent a vector. For example, {"id": 1, "my_vector": [1, 2, 3]}.

  2. I would like to use MeltanoLabs/target-postgres alongside pgvector/pgvector-python to declare the SQLAlchemy type of my_vector: pgvector.sqlalchemy.Vector

  3. I would like not to only declare the type but also any arbitrary parameteres for it, like length, dimension, etc.:

    # catalog metadata in Meltano syntax
    schema:
      my_stream:
        my_vector:
          sqlalchemy_type:
            (): "pgvector.sqlalchemy.Vector"
            dim: 3

    This would trigger the SDK to import pgvector.sqlalchemy.Vector and instantiate it as Vector(dim=3).

  4. The other requirement is that the user installs pgvector-python in the same virtual environment as target-postgres, which could be achieved with package extras, e.g. target-postgres[vector] or by documenting known postgres SQLAlchemy extensions in the target's readme.

Potential issues

The biggest problem I can think off is figuring out the priority with which this type override is considered. For example, in target-postgres:

  1. a few JSON schema types are mapped first to postgres-specific column types, e.g. `int -> BIGINT
  2. then the SDK defaults are used

But by introducing this feature, we'd expect targets to resolve in the following order:

  1. sqlalchemy_type overrides
  2. custom target implementation overrides
  3. SDK defaults

Dear Edgar,

thank you for converging this from MeltanoLabs/target-pinecone#20 so quickly. Is GH-1872 actually already setting the stage for your proposal, or is it something different?

With kind regards,
Andreas.