How to align and get the begin and end time of the query in database?
980202006 opened this issue · 4 comments
If I query the start and end positions of an audio, through faiss.search, it is possible that top n is a non-continuous value, how can I align it?
@980202006 For the details, here is one example.
We assume a Query as a sequence of 3 contiguous segments.
Reference DB: [z0, z1, z2, z3,...,z10, z11, z12,...z20]
Query: [z1', z2', z3']
The top N (e.g. N=5)
results of faiss.search
for the query, for example:
I(z1') = [1, 2, 11, 3, 15] # sorted top-5 result indices
D(z1') = [0.1, 0.2, 0.3, 0.4, 0.5] # distances are not required in this stage.
I(z2') = [11, 2, 1, 3, 12]
D(z2') = ...
I(z3') = [3, 2, 12, 4, 11]
D(z3') = ...
Here the result [11,2,1,3,12]
of the second segment's result I(z2')
implies that the sequence would start from '[10,1,0,2,11]. Based on this idea, we perform
offset-compensation` for each segment-wise search result.
# time-offset compensation
I'(z1) = I(z1) # [1, 2, 11, 3, 15]
I'(z2') = I(z2') - 1 # [10, 1, 0, 2, 11]
I'(z3') = I(z3') - 2 # [1, 0, 10, 2, 9]
We then perform np.unique
to reduce duplicated candidates
# unique
c_start_index = unique_set(all_I's) # [0, 1, 2, 3, 9, 10, 11, 15]
# from the `c_start_index`, we get the final candidates, c
c = [[0,1,2], [1,2,3], [2,3,4], [9,10,11], [10,11,12], [11,12,13], [15,16,17]]
We calculate the total distance (called score, but actually a penalty: smaller value is better) for each candidate sequence:
# d is distance
score(c[0]) = score([0,1,2]) = d(z1', z0) +d(z2', z1) + d(z3',z2)
score(c[1]) = score([1,2,3]) = d(z1', z1) + d(z2', z2) + d(z3', z3)
...
score(c[7]) = score([15,16,17]) = ...
Finally we take argmin(score)
.
In the paper, I described this using argmax
of cosine-similarity
instead of argmin
of euclidean-distance
. But in the end they are the same thing.
This is a great example, I want to ask regarding such cases, in order to evaluate, gt_ids = test_ids + dummy_db_shape[0]
in the original code considers same number of fp for the db and queries, but in this case it is not, then I should find at what time it starts and find the idx for the start point fp to find ground truths right?
Also this code fake_recon_index, index_shape = load_memmap_data( emb_dummy_dir, 'dummy_db', append_extra_length=query_shape[0], display=False) fake_recon_index[dummy_db_shape[0]:dummy_db_shape[0] + query_shape[0], :] = db[:, :]
only works if query_shape[0] = db_shape[0] otherwise I should use db_shape[0] am i right?
-
If what I remember is correct,
gt_ids = test_ids + dummy_db_shape[0]
was required because our DB was always constructed as dummy_db first, and test_db follows the dummy_db indices.
If you set dumm_db as empty (using np.empty()), then simplygt_ids = test_ids
. -
If you don't use
dummy_db
, again, it will be simplyfake_recon_index = db
. The reason why I use the on-disk solution of memmap variables like fake_recon_index is to save memory. Imagine ifdummy_db
was your actual DB with size of 2TB!