❗❗ This repo will no longer be maintained, please visit https://github.com/milvus-io/bootcamp ❗ ❗
This solution provides an example of consolidating Milvus (vector database) and PostgreSQL (relational database) to carry out the hybrid search of vectors and structured data.
In below example, feature vectors and structured data are used to represent human face attributes. Here is how hybrid search works out: First, search the top 10 most similar vectors (and their Euclidean distances) of a defined vector (could be a specified human face image). Then by comparing the Euclidean distance, find out (among the top 10 result vectors) vectors which have Euclidean distance < 1, and which at the same time meet the filtering conditions (gender, time, and if with glasses) in PostgreSQL.
Before executing the hybrid search, make sure you have completed the following steps:
- Install Milvus0.10.4
- Install PostgreSQL
- Use
pip install numpy
to download numpy. - Use
pip install psycopg2-binary
to download psycopg2. - Use
pip install faker
to download Faker.
The data used in this test are from ANN_SIFT1B .
- Base vectors: ANN_SIFT1B Base_set
- Query vectors: ANN_SIFIT1B Query_set
Note: You can also use data in
bvecs
format.
The following test scripts are used in this example:
mixed_import.py
for importing data into Milvus and PostgreSQL.mixed_query.py
for executing customized hybrid search.
Before executing this script, edit the following parameters in the script to match your runtime environment and data.
Parameter | Description |
---|---|
MILVUS_TABLE |
Name of the table to create in Milvus. |
PG_TABLE_NAME |
Name of the table to create in PostgreSQL. |
FILE_PATH |
Path of local storage of base vectors. |
VEC_NUM |
Total number of vectors to import into Milvus. |
BASE_LEN |
Number of vectors batch imported into the table. |
VEC_DIM |
Dimension set in the table in Milvus. It should be set to the dimension of the data to be imported |
SERVER_ADDR |
Address of Milvus server. |
SERVER_PORT |
Port of Milvus server. |
PG_HOST |
Address of PostgreSQL server. |
PG_PORT |
Port of PostgreSQL server. |
PG_USER |
Username to use in PostgreSQL. |
PG_PASSWORD |
Password to use in PostgreSQL. |
PG_DATABASE |
Database to use in PostgreSQL. |
When you have completed configuring the above parameter, you can import data by below command:
python3 mixed_import.py
After the execution, not only initial vectors are imported into Milvus, corresponding vector ids and vector attributes (such as gender, time the vector is generated, and if the human face wears glasses) are at the same time stored in PostgreSQL database.
Before searching vectors, edit the following parameters in the script to match your runtime environment.
Parameter | Description |
---|---|
QUERY_PATH |
Path for the local storage of query vectors. |
MILVUS_TABLE |
Name of the table to create in Milvus. Use the same table name set for Milvus in mixed_import.py . |
PG_TABLE_NAME |
Name of the table to create in PostgreSQL. Use the same table name set for PostgreSQL in mixed_import.py . |
SERVER_ADDR |
Address of Milvus server. |
SERVER_PORT |
Port of Milvus server. |
PG_HOST |
Address of PostgreSQL server. |
PG_PORT |
Port of PostgreSQL server. |
PG_USER |
Username to use in PostgreSQL. |
PG_PASSWORD |
Password to use in PostgreSQL. |
PG_DATABASE |
Database to use in PostgreSQL. |
TOP_K |
The top k most similar result vectors. |
DISTANCE_THRESHOLD |
Threshold to filter the top k result vectors. Default value is 1. Vectors with a Euclidean distances smaller than this threshold will be selected out. |
Variable | Description |
---|---|
-n / --num |
Defines the ordinal rank of the query vector in the vector base set. |
-s / --sex |
Define the gender of the human face: male or female . |
-t / --time |
Specifies the query time range, e.g. [2019-04-05 00:10:21, 2019-05-20 10:54:12] |
-g / --glasses |
Defines if the human face wears glasses: True or False . |
-q / --query |
Starts the query execution. |
-v / --vector |
The vectors corresponding with the ids entered. |
To search the top k most similar vectors of the vector which ranks 0
in the query vector set, meanwhile, the result vectors must match conditions that the gender is male
, and that the vectors were generated during the time range of [2019-05-01, 2019-07-12]
:
python3 mixed_query.py -n 0 -s male -t '[2019-05-01 00:00:00, 2019-07-12 00:00:00]' -q
To search the top k most similar vectors of the 20th vector in the query vector set, meanwhile, the result vectors must match conditions that the gender is female
who wears no glasses:
python3 mixed_query.py -n 20 -s female -g False
To search the top k most similar vectors of the 100th vector in the query vector set, with gender female
who wears glasses, and during the time range of [2019-05-01 15:15:05, 2019-07-30 11:00:00]
:
python3 mixed_query.py -n 100 -s female -g True -t '[2019-05-01 15:15:05, 2019-07-30 11:00:00]' -q
To search the vector based on the vector id:
python3 mixed_query.py -v 237434787867
In conclusion, this solution demonstrates an example of hybrid search of structured and unstructured data using Milvus and PostgreSQL. Milvus supports easy integration with other relational databases to achieve hybrid search to match various scenarios.