The VectorTable
class is a PHP implementation designed to facilitate the storage, retrieval, and comparison of high-dimensional vectors in a MySQL database. This class utilizes MySQL JSON data types and a custom cosine similarity function (COSIM
) to perform vector comparisons efficiently.
Vectors are binary quantized upon insertion into the database to optimize search speed and reranked to improve accuracy. However, this library is only suitable for small datasets (less than 1,000,000 vectors). For large datasets, it is recommended that you use a dedicated vector database such as Qdrant.
Search Benchmarks (384-dimensional vectors):
Vectors | Time (seconds) |
---|---|
100 | 0.02 |
1000 | 0.02 |
10000 | 0.03 |
100000 | 0.06 |
1000000 | 0.48 |
- Store vectors in a MySQL database using JSON data types.
- Calculate cosine similarity between vectors using a custom MySQL function.
- Normalize vectors and handle vector operations such as insertion, deletion, and searching.
- Support for vector quantization for optimized search operations.
- Native PHP support for generating for text embeddings using the BGE embedding model.
- PHP 8.0 or higher.
- MySQL 5.7 or higher with support for JSON data types and stored functions.
- A MySQLi extension for PHP.
-
Ensure that PHP and MySQL are installed and properly configured on your system.
-
Install the library using Composer.
composer require allanpichardo/mysql-vector
Import the VectorTable
class and create a new instance using the MySQLi connection, table name, and vector dimension.
use MHz\MysqlVector\VectorTable;
$mysqli = new mysqli("hostname", "username", "password", "database");
$tableName = "my_vector_table";
$dimension = 384;
$engine = 'InnoDB';
$vectorTable = new VectorTable($mysqli, $tableName, $dimension, $engine);
The initialize
method will create the vector table in MySQL if it does not already exist. This method will also create the COSIM
function in MySQL if it does not already exist.
$vectorTable->initialize();
// Insert a new vector
$vector = [0.1, 0.2, 0.3, ..., 0.384];
$vectorId = $vectorTable->upsert($vector);
// Update an existing vector
$vectorTable->upsert($vector, $vectorId);
// Delete a vector
$vectorTable->delete($vectorId);
// Calculate cosine similarity between two vectors
$similarity = $vectorTable->cosim($vector1, $vector2);
Perform a search for vectors similar to a given vector using the cosine similarity criteria. The topN
parameter specifies the maximum number of similar vectors to return.
// Find vectors similar to a given vector
$similarVectors = $vectorTable->search($vector, $topN);
The Embedder
class calculates 384-dimensional text embeddings using the BGE embedding model. The first time you instanciate the Embedder
class, the ONNX runtime will be installed automatically.
The maximum length of the input text is 512 characters. The Embedder
class will automatically truncate the input text to 512 characters if it is longer than 512 characters.
use MHz\MysqlVector\Nlp\Embedder;
$embedder = new Embedder();
// Calculate the embeddings for a batch of text
$texts = ["Hello world!", "This is a test."];
$embeddings = $embedder->embed($texts);
print_r($embeddings[0][0]); // [0.1, 0.2, 0.3, ..., 0.384]
print_r($embeddings[1][0]); // [0.1, 0.2, 0.3, ..., 0.384]
Contributions to this project are welcome. Please ensure that your code adheres to the existing coding standards and includes appropriate tests.
MIT License