OpenMined/PSI

Intersection size is bigger than the server size

chesterxgchen opened this issue · 5 comments

Description

The result of the intersection of client and server items should be equal or less than both client's items size and server's items size. But with latest code, the intersection in some cases, larger than the server size.

If the intersection indices is larger than client size, we can simply drop the indices > the client items size, but if the intersection is larger than server size, the result is probably wrong.

for example,
client size = 263
server size = 132
The resulting intersection size = 167 > 132, this shouldn't be possible.

How to Reproduce

  1. Go to https://github.com/chesterxgchen/psi_mpc_related/blob/main/psi/psi_intersect.ipynb
  2. first import psi for version > 1.0.3 (cell 4)
  3. then cell 5
  4. Scroll down to cell 8 to run the test with emails
  5. See output

Expected Behavior

Intersection size <= min (client items, server items) = (132, 263) = 132
but it returns 167

Screenshots

image

System Information

  • OS: Unbuntu
  • OS Version: 20.04
  • Language Version: [Python 3.8.16]
  • Package Manager Version: Pip3

Additional Context

Find out this during NVFLARE test

@s0l0ist sorry, there is another size effect from the PR changes.

@chesterxgchen, your list of client inputs contains duplicate entries and is not a unique set. Therefore, the intersection results are correct and point to indices of the same matching entry multiple times which explains why you see this discrepancy.

thanks for spotting it. let me check it, sorry I did not notice it

Actually,
iset = set(intersection)
print("interset index = ", iset)
print("interset size = ", len(iset)
the intersect size shouldn't have any duplicates as it already apply the set() operation

But let me make sure there is no duplicate in the client and server items

After make sure the client/server the items are unique items, it works. thanks for the quick response