mimbres/neural-audio-fp

How to align and get the begin and end time of the query in database?

980202006 opened this issue · 4 comments

If I query the start and end positions of an audio, through faiss.search, it is possible that top n is a non-continuous value, how can I align it?

The simplest answer is matcher.

@980202006 For the details, here is one example.
We assume a Query as a sequence of 3 contiguous segments.

Reference DB: [z0, z1, z2, z3,...,z10, z11, z12,...z20]
Query: [z1', z2', z3']

The top N (e.g. N=5) results of faiss.search for the query, for example:

I(z1') = [1, 2, 11, 3, 15] # sorted top-5 result indices
D(z1') = [0.1, 0.2, 0.3, 0.4, 0.5] # distances are not required in this stage.

I(z2') = [11, 2, 1, 3, 12]
D(z2') = ...
 
I(z3') = [3, 2, 12, 4, 11]
D(z3') = ...

Here the result [11,2,1,3,12] of the second segment's result I(z2') implies that the sequence would start from '[10,1,0,2,11]. Based on this idea, we perform offset-compensation` for each segment-wise search result.

# time-offset compensation
I'(z1) = I(z1)  # [1, 2, 11, 3, 15]
I'(z2') = I(z2') - 1 # [10, 1, 0, 2, 11]
I'(z3') = I(z3') - 2 # [1, 0, 10, 2, 9]

We then perform np.unique to reduce duplicated candidates

# unique
c_start_index = unique_set(all_I's) # [0, 1, 2, 3, 9, 10, 11, 15]
# from the `c_start_index`, we get the final candidates, c
c = [[0,1,2], [1,2,3], [2,3,4], [9,10,11], [10,11,12], [11,12,13], [15,16,17]] 

We calculate the total distance (called score, but actually a penalty: smaller value is better) for each candidate sequence:

# d is distance
score(c[0]) = score([0,1,2]) = d(z1', z0) +d(z2', z1) + d(z3',z2)
score(c[1]) = score([1,2,3]) = d(z1', z1) + d(z2', z2) + d(z3', z3)
...
score(c[7]) = score([15,16,17]) = ...

Finally we take argmin(score).

In the paper, I described this using argmax of cosine-similarity instead of argmin of euclidean-distance. But in the end they are the same thing.

This is a great example, I want to ask regarding such cases, in order to evaluate, gt_ids = test_ids + dummy_db_shape[0] in the original code considers same number of fp for the db and queries, but in this case it is not, then I should find at what time it starts and find the idx for the start point fp to find ground truths right?
Also this code fake_recon_index, index_shape = load_memmap_data( emb_dummy_dir, 'dummy_db', append_extra_length=query_shape[0], display=False) fake_recon_index[dummy_db_shape[0]:dummy_db_shape[0] + query_shape[0], :] = db[:, :]
only works if query_shape[0] = db_shape[0] otherwise I should use db_shape[0] am i right?

@mhmd-mst

  • If what I remember is correct, gt_ids = test_ids + dummy_db_shape[0] was required because our DB was always constructed as dummy_db first, and test_db follows the dummy_db indices.
    If you set dumm_db as empty (using np.empty()), then simplygt_ids = test_ids.

  • If you don't use dummy_db, again, it will be simply fake_recon_index = db. The reason why I use the on-disk solution of memmap variables like fake_recon_index is to save memory. Imagine if dummy_db was your actual DB with size of 2TB!