serengil/tensorflow-101

Voyager Face embedding storing

Raghucharan16 opened this issue · 15 comments

what exactly is this piece of code doing

for i in range(len(embeddings), target_size):
    embedding = np.random.uniform(-5, +5, num_dimensions)
    embeddings.append(embedding)
    img_names.append(f'synthetic_{i}.jpg')
print(f'There are {len(embeddings)} embeddings available')

and can i just add my own faces embeddings without creating synthetic data? if so how can i do that?
Thank you.

where did you get this?

it is for adding synthetic data. i wanted to test some ann algorithms on very large data. of course you do not have to have that block. working with just real data is better.

on your blog
how can i do that, like how can add embedding with n_dimension parameter?
and also while running it in vs code it is not showing the result picture?
I had this code,

# built-in dependencies
import os
import time

# third-party dependencies
import numpy as np
import cv2
import matplotlib.pyplot as plt
from deepface import DeepFace
from voyager import Index,Space
model_name = 'Facenet'
detector_backend = 'mtcnn'
num_dimensions = 128 # Facenet produces 128-dimensional vectors 

img_names = []
embeddings = []

for dirpath, dirnames, filenames in os.walk('dbmod'):
    for filename in filenames:
        if '.jpg' in filename:
            try:

                img_name = f'{dirpath}{filename}'
                
                embedding_objs = DeepFace.represent(
                    img_name, model_name=model_name, detector_backend=detector_backend
                )
                embedding = embedding_objs[0]['embedding']
                embedding=embedding,num_dimensions
                embeddings.append(embedding)
                img_names.append(img_name)
            except Exception as e:
                pass
# target_size = 10000
# for i in range(len(embeddings), target_size):
#     embedding = np.random.uniform(-5, +5, num_dimensions)
#     embeddings.append(embedding)
#     img_names.append(f'synthetic_{i}.jpg')
print(f'There are {len(embeddings)} embeddings available')
index = Index(Space.Euclidean, num_dimensions=num_dimensions)
embeddings_np = np.array(embeddings)
tic = time.time()

index.add_items(embeddings_np)

toc = time.time()

print(
    f'{embeddings_np.shape[0]} embeddings are stored in voyager in '
    f'{round(toc-tic, 2)} seconds'
)
target_img = 'sample.jpg'
embedding_obj = DeepFace.represent(
    target_img, model_name=model_name, detector_backend=detector_backend
)
target_embedding = embedding_obj[0]['embedding']
tic = time.time()

neighbors, distances = index.query(target_embedding, k=3)

toc = time.time()

print(
    f'Index search completed in {toc-tic} seconds among '
    f'{embeddings_np.shape[0]} vectors'
)
target_img = cv2.imread('Madhursample.jpg')

for i, neighbor in enumerate(neighbors):
    img_name = img_names[neighbor]
    label = img_name.split('/')[-1]
    distance = distances[i]
    print(
        f'{i+1}. nearest neighbor is {label} with distance {round(distance)}'
    )

I'm getting this output, and [error]

There are 0 embeddings available
Traceback (most recent call last):
  File "/home/narravenkataraghucharan/Desktop/ufacedetection/face_voyager.py", line 87, in <module>
    index.add_items(embeddings_np)
ValueError: Input array was expected to have rank 2, but had rank 1.

And for me 110 face embeddings are taking more than a minute for storing in voyager. but the search was fast though. could you check what went wrong??
this is the code:

import os
import time
import logging
import numpy as np
import cv2
from deepface import DeepFace
from voyager import Index, Space

model_name = 'Facenet'
detector_backend = 'mtcnn'
num_dimensions = 128  # Facenet produces 128-dimensional vectors 

img_names = []
embeddings = []

for dirpath, dirnames, filenames in os.walk('dbmod'):
    for filename in filenames:
        if '.jpg' in filename:
            try:
                img_name = os.path.join(dirpath, filename)
                
                # Generate embedding
                embedding_objs = DeepFace.represent(img_name, model_name=model_name, detector_backend=detector_backend)
                embedding = embedding_objs[0]['embedding']
                logging.debug(f"Successfully generated embedding for {img_name}")
                
                # Append to lists
                embeddings.append(embedding)
                img_names.append(img_name)
            except Exception as e:
                logging.error(f"Error generating embedding for {img_name}: {e}")
                pass

# Print number of embeddings
print(f'There are {len(embeddings)} embeddings available')

# Initialize Voyager index
index = Index(Space.Euclidean, num_dimensions=num_dimensions)

# Add embeddings to index
embeddings_np = np.array(embeddings)
index.add_items(embeddings_np)

# Process target image
target_img = 'sample.jpg'
embedding_obj = DeepFace.represent(target_img, model_name=model_name, detector_backend=detector_backend)
target_embedding = embedding_obj[0]['embedding']

# Perform index search
neighbors, distances = index.query(target_embedding, k=1)

# Print results
print(f'Index search completed among {embeddings_np.shape[0]} vectors')

# Display nearest neighbors
for i, neighbor in enumerate(neighbors):
    img_name = img_names[neighbor]
    label = img_name.split('/')[-1]
    distance = distances[i]
    print(f'{i+1}. Nearest neighbor is with distance {round(distance)}')

Nothing! Creating index takes time but it offers fast search.

so this can't be faster than this?? like for mere 100 images it is taking 1 min to store?

if you have 100 images, then you should not use an index method. deepface's find function performs better.

index methods should be adopted if you have 1M+ samples.

yes, indeed deepface's find function is much faster but for my data, it is not giving accurate results

Hey @serengil I have one small task to do, would you give me a hand if possible,
The task is I have multiple folders containing faces in them, say folder1 has A,B,C,D faces and folder2 have A,D,E,F faces now my task is to iterate the 2 folders [basically there will be more] and save the unique faces in another folder say unique_faces_folder. what i'm doing is before adding a face, i'm verifying it through deepface's verify method and also tried the find method on [uniwue_faces-folder but i'm getting false positives. and with verify method it is taking too much time. what would be suggested way to improve and solve the use case. i'm using yolov9 for face detection. tried voyage and annoy too for first nearest neighour but those are giving mixed results.

the best way to do that is to use verify function - it will take some time

Yeah verify gave me better results but taking some time. why can't we get same results with find function as it is very fast?
compared to iterative checking.

we discussed this yesterday, verify and find are doing same, find stores its outcomes in a pickle file to restore later.

yeah we discussed about it. but for me results are not same. hoping the insight face's buffalo_l model will give better results. thanks for your patience and we appreciate your work.

Hey just my opinion Vector store like Milvus can give you dynamic indexing and storing,need not build everytime. + they have searching and indexing params you can configure. checkout Milvusdb

@darkar18 thanks for suggestion i'll look into it.