OpenMined/PSI

PSI Future Directions

Daniel-Liu-c0deb0t opened this issue · 7 comments

Welcome to the OpenMined Private Set Intersection repository! Thank you for your interest in contributing to the PSI project.

Description

The OpenMined PSI team works on the core C++ code for Private Set Intersection, which uses Bloom Filters and elliptic-curve Diffie-Hellman (ECDH). Additionally, we want to make our implementation accessible to downstream projects (like PyVertical), so we provide bindings in a variety of languages, from Go to Javascript.

Potential Avenues of Future Work

We are always looking forward to new contributors! Here are some potential areas where you could start contributing:

  • Unit and integration tests - help verify that our code is secure and reliable.
  • Documentation - help make our code more user-friendly. The documentation needs to be updated for the newly added Golomb Compressed Sets.
  • New language bindings - help make PSI more accessible. Currently, we are interested in Java bindings, and we need to update all bindings to handle Golomb Compressed Sets.
  • New features and algorithm improvements - any new features or improvements to our algorithm would be beneficial. There are many applications of Private Set Intersection, and implementing new applications is always welcome.

Feel free to expand on this issue with more ideas! Open a new issue or pull request to get feedback from other contributors.

Hey @Daniel-Liu-c0deb0t ,

The PyVertical team have been discussing some additional features. We don't know if these are possible 😅

I'll briefly explain them here, but we can open separate issues or discuss on slack if we need to talk in more detail.

CC @H4LL @daler3

Multi-party PSI

Parties A, B, C have IDs and we want to find a global intersection. This can currently be achieved by having multiple 1-to-1 client/server relationships. Could we combine this into a single interaction?

Duet intermediary

Parties A and B have IDs. Party C doesn't have IDs, but is acting as the intermediary between A and B. Is there a way for B to orchestrate the intersection of A's and B's IDs without C having access to the data? This would be in executed in duet, so C would have access to pointers from A and B and the intersection code would be run by C

Tagging others who likely have more experience with PSI protocols: @s0l0ist @bcebere @schoppmp. They can probably give better answers than me.

With the current protocol, multi-party PSI with a single interaction is likely not possible, because each party has their own private keys that they encrypt their own set with.

Having an intermediary that only works with pointers should be possible. One concern here could be what happens if C passes along its own data in a sort of MITM attack.

It'll be best if you opened a separate issue with more details. Thanks!

@TTitcombe

I'm not super familiar with Duet, but re multi-party PSI: what do you mean by "a single interaction"? Assuming you're only talking about regular PSI and not cardinality, A can discover the intersection of (A, B, C) by discovering the intersection of (A, B) from B and then using this to discover the intersection of (A, B, C) from C. Assuming A, B and C are communicating over a network this currently involves 2 network requests, which... is as good as it can get? Is what you want just a wrapper that would coordinate these requests?

Edit: Just realised you might have meant finding the global intersection without anyone knowing the local intersections?

@willclarktech

Thank you for your answer. Yes, @TTitcombe meant to find the global intersection without anyone knowing local intersections.
For instance, if we have A,B,C , could we get the intersection of those 3 without finding (A,B) first and then use this to discover (A,B,C)?

In a Duet context this would be important if a Data Scientist wants to work on the A,B,C's data (In this case, A,B,C would be Data Owners), and we don't want them or the Data Scientist to discover local intersections first.

@daler3 I believe you can achieve this with PSI based on homomorphic encryption (eg Paillier):

  1. A encrypts 1/0 for each element in the domain and sends the encrypted values to B
  2. B homomorphically multiplies by 1/0 and sends the encrypted values to C
  3. C homomorphically multiplies by 1/0, (optionally homomorphically sums for cardinality only) and sends the encrypted values to A
  4. A decrypts the result to discover the intersection

If I understand correctly, Duet lets the data scientist send data from one data owner to another without seeing it in between, correct? In the Duet example then, the data scientist could encrypt a 1 for every element in the domain before sending it round to the data owners. The downside of this is it scales poorly with the size of the domain (which is why this library uses a different approach).

The setup we have in this library very much assumes that only two parties are involved in each intersection. Maybe @schoppmp is familiar with related ways to do multi-party PSI?

Hi, it seems that there has been done some work on java-bindings in the java-test branch.
@Daniel-Liu-c0deb0t can I work further on this?
I am interested in doing that project, and in other possible contribution areas.

Thanks for your interest! Unfortunately, that branch is quite stale and we had several changes since then that are potentially breaking. I'd suggest only looking at that branch for reference and starting fresh (the toolchain has been updated earlier this year)