`mapping_types` config seems to be ignored on `diff`
OskarStark opened this issue · 1 comments
Using the latest bundle version and Postgres.
I have the following entity
<?php
declare(strict_types=1);
namespace App\Entity;
use Doctrine\ORM\Mapping as ORM;
use LLPhant\Embeddings\VectorStores\Doctrine\DoctrineEmbeddingEntityBase;
#[ORM\Entity]
class Snippet extends DoctrineEmbeddingEntityBase
{
#[ORM\Id]
#[ORM\Column(type: 'integer')]
#[ORM\GeneratedValue(strategy: 'AUTO')]
public int $id;
#[ORM\Column(type: VectorType::VECTOR, length: 1536)]
public ?array $embedding;
public function getId(): int
{
return $this->id;
}
}
VectorType
:
<?php
namespace LLPhant\Embeddings\VectorStores\Doctrine;
use Doctrine\DBAL\Exception;
use Doctrine\DBAL\Platforms\AbstractPlatform;
use Doctrine\DBAL\Platforms\PostgreSQLPlatform;
use Doctrine\DBAL\Types\Type;
use Doctrine\Deprecations\Deprecation;
class VectorType extends Type
{
final public const VECTOR = 'vector';
/**
* @param mixed[] $column
*/
public function getSQLDeclaration(array $column, AbstractPlatform $platform): string
{
// getName is deprecated since doctrine/dbal 2.13 see: https://github.com/doctrine/dbal/issues/4749
// BUT it is the most stable way to check if the platform is PostgreSQLPlatform in a lot of doctrine versions
// so we will use it and add a check for the class name in case it is removed in the future
if (method_exists($platform, 'getName') && $platform->getName() !== 'postgresql') {
throw Exception::notSupported('VECTORs not supported by Platform.');
}
if (! isset($column['length'])) {
throw Exception::notSupported('VECTORs must have a length.');
}
if ($column['length'] < 1) {
throw Exception::notSupported('VECTORs must have a length greater than 0.');
}
if (! is_int($column['length'])) {
throw Exception::notSupported('VECTORs must have a length that is an integer.');
}
return sprintf('vector(%d)', $column['length']);
}
/**
* @return float[]
*/
public function convertToPHPValue(mixed $value, AbstractPlatform $platform): array
{
if ($value === null) {
return [];
}
$value = is_resource($value) ? stream_get_contents($value) : $value;
if (! is_string($value)) {
throw Exception::notSupported('Error while converting VECTORs to PHP value.');
}
$convertedValue = explode(',', $value);
$floatArray = [];
foreach ($convertedValue as $singleConvertedValue) {
$floatArray[] = (float) $singleConvertedValue;
}
return $floatArray;
}
public function convertToDatabaseValue(mixed $value, AbstractPlatform $platform): string
{
//If $value is not a float array throw an exception
if (! is_array($value)) {
throw Exception::notSupported('VECTORs must be an array.');
}
return VectorUtils::getVectorAsString($value);
}
public function getName(): string
{
return self::VECTOR;
}
}
# config/packages/doctrine.yaml
doctrine:
dbal:
url: '%env(resolve:DATABASE_URL)%'
mapping_types:
vector: array
types:
vector: LLPhant\Embeddings\VectorStores\Doctrine\VectorType
the created migration:
final class Version20231101073053 extends AbstractMigration
{
public function up(Schema $schema): void
{
$this->addSql('CREATE SEQUENCE snippet_id_seq INCREMENT BY 1 MINVALUE 1 START 1');
$this->addSql('CREATE TABLE snippet (id INT NOT NULL, embedding vector(1536) NOT NULL, PRIMARY KEY(id))');
}
public function down(Schema $schema): void
{
$this->throwIrreversibleMigrationException();
}
}
From my understanding, because I added mapping_types: vector: array
it should look like:
$this->addSql('CREATE SEQUENCE snippet_id_seq INCREMENT BY 1 MINVALUE 1 START 1');
$this->addSql('CREATE TABLE snippet (id INT NOT NULL, embedding vector(1536) NOT NULL, PRIMARY KEY(id))');
+ $this->addSql('COMMENT ON COLUMN snippet.embedding IS \'(DC2Type:array)\'');
If I manually add this line, all works good, but if I want to generate the diff again I get another migration over and over again:
final class Version20231101073443 extends AbstractMigration
{
public function up(Schema $schema): void
{
$this->addSql('ALTER TABLE snippet ALTER embedding TYPE vector(1536)');
$this->addSql('COMMENT ON COLUMN snippet.embedding IS NULL');
}
public function down(Schema $schema): void
{
$this->throwIrreversibleMigrationException();
}
}
From my understanding of mapping types, types and comments...
mapping_types
are rules used by DBAL to convert SQL types to DBAL types.types
are rules used by DBAL to convert DBAL types into PHP Types (or better, to choose a converter that maps "SQL data" to PHP types/data).
as example:
- postgres
bigserial
is converted into DBAL'sbigint
(thanks to built inmapping_types
) - DBAL's
bigint
is later converted in PHP'sstring
(thanks to built intypes
)
Thus in your case i would expect something as:
mapping_types:
vector: LLPhantVectorType
types:
LLPhantVectorType: LLPhant\Embeddings\VectorStores\Doctrine\VectorType
The fact that your implementation of VectorType
returns an array
is is unknown to doctrine.
The comment is used to infer a Doctrine type when there are multiple doctrine types mapped on the same SQL type. The classical example is:
Consider an SQL type varchar(255)
, that type can be used as the Php's string
, MyEnum
(a backed enum that uses strings as cases) or any PHP type that can be serialized into a string with max 255 chrs.
When DBAL encounters a vector
SQL type, needs to decide which DBAL Type assign to it. If a comment is found, it will try to find that type, if not it will find a default.
You might have configs defined as follows:
mapping_types:
vector: LLPhantVectorType
types:
LLPhantVectorType: LLPhant\Embeddings\VectorStores\Doctrine\VectorType
GoetasVectorType: GoetasVectorType
In this case, both GoetasVectorType
and LLPhant\Embeddings\VectorStores\Doctrine\VectorType
want to use a SQL vector, but DBAL needs to a way to decide which type to instantaite when an SQL column of type vector
is found.
GoetasVectorType
will have to override requiresSQLCommentHint
. Columns that use GoetasVectorType
will have a comment, columns that use LLPhant\Embeddings\VectorStores\Doctrine\VectorType
will not.
When DBAL finds an SQL vector
type will check for a comment, when found will try to instantiate the type in the comment. If no comment is found, will try to instantiate the type defined in the configs.