roberto-bayardo/google-all-pairs-similarity-search

How to use two data sets to compute their intersection?

GoogleCodeExporter opened this issue · 1 comments

Roberto: Sorry for the late reply but for whatever reason, the first
notification about your Jan 2nd question got lost in my spam filter.
Since you closed the original ticket I am opening a new one with
clarifications.

What I meant is the ability to provide as an input not one dataset but two
dataset. 

In this setting, one dataset would be some "reference" and the second
dataset a "query" dataset. 
The goal would be to find all items in the "query" set that are similar to
items in the "reference" data set above a certain threshold: basically
returning the similarity intersection between the two sets as opposed to
the current setting where only pairs within the same are considered. I
guess one way could be to merge the sets and discard pairs returned from
the same set, though that does seem pretty naive.  

Original issue reported on code.google.com by pombreda...@gmail.com on 26 Jan 2010 at 6:59

Sorry for the incredibly late reply, I obviously need to set up notifications.

Yes your solution of merging the data would work. You could also implement this 
by having the algorithm build an index only over the "reference" dataset. Then 
you could iterate over the elements of the query elements and probe the index 
as before.

Original comment by roberto....@gmail.com on 15 Sep 2010 at 11:27

  • Changed state: WontFix