OpenMined/PSI

To find out server side intersection, is possible to use bloom filter bit operation ?

chesterxgchen opened this issue · 3 comments

Question

In the federated learning case, the intersections needs to be calculated for both sites. using PSI to achieve this, requires reverse process. If we want to improve the performance, can we leverage the bit operation on filter rather than reverse the PSI setup, request, response process.

Further Information

Use Case: Federated Vertical learning, two sites has different user records, we need find the match user Ids before we can training the mode. Both Sites will need to know the intersection.
Suppose we use PSI to find the match user Ids. Take site-1 as PSI Server (S), site-2 as PSI client (C). The current process

Forward Pass:
S (setup msg) --> bloom filter ==> delivery ==> C (receive setup bloom filter)
S (receive request) <== request <== C (prepare request msg)
S (response msg) ==> deliver ===> C ( receive response, compare with the setup bloom filter to find the intersection)

at the end of forward Pass, Site-1 doesn't know intersection, Site-2 knows the intersection.

To calculate the S intersection, we can reverse the process, now takes Site-2 as Server Role, use the intersection as the Items. Repeat above process, called Reverse Pass. then Site-1 knows the intersection as well.

This works, but I am wondering if there is a faster way to do this. Looking at above Forward Pass.

Step 3 Client: Intersection calculation,
One we have the intersection, we can derive an new bloom filter based on the setup bloom filter where only the intersection bits are 1, others are 0. Let's call it intersect bloom filter

On the reverse pass, instead repeat above step, can we simply sent the intersect bloom filter back to Server.

C (intersect bloom filter) ===> S (receive intersect bloom filter)

Server
loop all items to check against the intersect bloom filter to figure out the intersection.

The question, is this reasonable and secure ? If yes, can you add these in the Python APIs ?

Screenshots

None

System Information

N/A

Additional Context

Add any other context about the problem here.

@s0l0ist can you check if this make sense ?

My short answer is, no I don't think this is possible, but I might be misunderstanding the proposed solution.

Every intersection calculation in the protocol should use a new key and a new instance of the backing data structure (GCS/Bloomfilter). If you do not, then it allows the client to learn from the server (over many intersection requests) and can eventually enumerate/bruteforce the server's set.

Reusing the bloom filter (keeping the 1s as markers for the intersection and 0s for everything else) means you'd 1) be reusing the same key and 2) leak the client (site-2) intersections to server (site-1). The server now knows something from the client's set where if it used the 2-pass suggestion it could not.

@s0l0ist Nick, thanks for the clear explanation. This make senses.