OpenMined/PSI

Rust Bindings

Closed this issue · 7 comments

Hey, I am willing to help create a Rust binding for this project over the summer. Here's what I think may be possible through my brief look at the code:

  1. Create a simple wrapper over the C++ code (like the bindings in the other languages). Rust does not support directly interfacing with C++, so the backend bloom filter stuff has to be exposed through C, before they are called in Rust.
  2. Reimplement the bloom filter in Rust. Rust is very efficient and safe, so this may be beneficial in guaranteeing memory safety and eliminating bugs.
  3. If possible, I want to try to use SIMD intrinsics in the Rust reimplementation of the bloom filter for speedups. I do remember some people applying SIMD to bloom filters in other fields like genomic sketching. I am not sure if this is possible with the current implementation but I will think about it if I reach this point.

If the maintainers are interested in this, then I can get started with getting familiar to the code. I would prefer to do this project from around mid-June to August/September.

About me: I am a high school student going into undergrad in CS next year. I have a lot of experience with writing research software in fields like deep learning or bioinformatics. Currently, I am getting familiar with Rust by working on a SIMD-accelerated library for string edit distances. In fact, in my project, I've seen SIMD give up to 20-30x improvement over their scalar implementations. I have no formal background in security, but I am relatively familiar with the concept of bloom filters and I've worked on adversarial robustness for deep learning. I am willing to dedicate a significant chunk of time to learning, writing code, benchmarking, and testing for this project. My goal is to make a useful Rust binding and also allow myself to gain some experience working on a large software project.

Hey Daniel - As far as I know, the bloom filter implementation shouldn't be exposed to the user, so it's more like an internal component to the library, would that also require to re-implement it in rust?

Ah sorry, I took a look again and it seems that Rust can call the client and server stuff directly for step 1. Looking at the Go code, it seems that this step is fairly simple. The only step is to call the relevant API functions that are implemented in C. For step 2, then I could work on reimplementing lower level stuff in Rust. Of course, the main benefit of this is the (hopefully) better memory safety of Rust.

I don't see a roadmap for this project anywhere, so if there's another more important component that I could work on instead, then please tell me.

Hi Daniel,
I think Rust bindings would be great! I've worked on generating bindings for a C library a while back, using bindgen. I just noticed that bindgen also has basic C++ support. Maybe that is an option here? On the other hand, our library makes heavy use of templates (like StatusOr, which is the C++ equivalent of Rust's std::Result). While all themplate types are concretely instantiated (i.e., we don't need templates to be exposed as Rust generics), this might cause problems. But as a fallback you can always use the already existing C bindings. What do you think?

Looking at the Go code, it doesn't seem like there are a lot of functions that need to be exposed for the Rust bindings to work. Therefore, it shouldn't be too difficult to match the Go bindings and use the existing C bindings. I am not sure whether C++ bindings will be strictly better than C bindings as I'm not too familiar with the code so far, so I think using the existing method of binding to C is easier to do in Rust.

Is there a roadmap/timeline on when stuff should be done for downstream libraries?

Daniel,

We currently do not have an official roadmap - it is a work in progress. The idea is to get as many bindings done that are relevant to what the industry needs. We are building out bindings and additional functionality in parallel.

You may use either C or C++, whichever works better for Rust. I don’t have a good answer for you there.

One thing to note about SIMD, I’m sure it would speed up some of the code, but it needs to be completely configurable with Bazel as those intrinsics do not carry over to WebAssembly that easily and could actually be slower. My suggestion is to first work on proper bindings and then we could experiment with intrinsics after.

Welcome - happy to have you on board!

I see that there is work on automatically generating bindings in #81, which addresses this issue. If bindgen does not work well and manual binding generation is needed, I am able to work on it. Sorry I wasn't able to work on this earlier; I had some stuff come up that I had to address.

resolved with #82