Intersection size is bigger than the server size
chesterxgchen opened this issue · 5 comments
Description
The result of the intersection of client and server items should be equal or less than both client's items size and server's items size. But with latest code, the intersection in some cases, larger than the server size.
If the intersection indices is larger than client size, we can simply drop the indices > the client items size, but if the intersection is larger than server size, the result is probably wrong.
for example,
client size = 263
server size = 132
The resulting intersection size = 167 > 132, this shouldn't be possible.
How to Reproduce
- Go to https://github.com/chesterxgchen/psi_mpc_related/blob/main/psi/psi_intersect.ipynb
- first import psi for version > 1.0.3 (cell 4)
- then cell 5
- Scroll down to cell 8 to run the test with emails
- See output
Expected Behavior
Intersection size <= min (client items, server items) = (132, 263) = 132
but it returns 167
Screenshots
System Information
- OS: Unbuntu
- OS Version: 20.04
- Language Version: [Python 3.8.16]
- Package Manager Version: Pip3
Additional Context
Find out this during NVFLARE test
@s0l0ist sorry, there is another size effect from the PR changes.
@chesterxgchen, your list of client inputs contains duplicate entries and is not a unique set. Therefore, the intersection results are correct and point to indices of the same matching entry multiple times which explains why you see this discrepancy.
thanks for spotting it. let me check it, sorry I did not notice it
Actually,
iset = set(intersection)
print("interset index = ", iset)
print("interset size = ", len(iset)
the intersect size shouldn't have any duplicates as it already apply the set() operation
But let me make sure there is no duplicate in the client and server items
After make sure the client/server the items are unique items, it works. thanks for the quick response