This repository contains an efficient implementation of Oblivious Message Retrieval.
Asynchronous messaging without leaking metadata is hard and every existing way to send message asynchronously requires metadata. The reason for this is obvious. To send a message to your friend asynchronously you atleast need to tag your message with your friend's identity. This enables the server to notify your friend of your message when they come online. Thus every message requires a "to" field. OMR is a way to send message asynchronously without revealing the "to" field. This means using OMR the server can deliver messages intended for your friend when they come online without ever learning which of the messages from the set of all messages were intended for them.
Other than being useful for asynchronous messaging without leaking metadata, OMR can be useful for improving the user experience of privacy preserving blockchains.
In privacy preserving blockchains like Aztec/Zcash all transactions are encrypted. At present the only way for users to find their pertaining transactions is to download entire set (or a representation) of transactions. Then trial decrypt them to find their pertaining ones. The problem with trial decryption is obvious. Client's experience of using the chain degrades as the chain gains traction. This is because client's cost of trial decryption, in terms of bandwidth and local computation, increases linearly with no. of transactions sent on chain globally. With OMR the processing of transactions (ie messages) is offloaded to the server. The client simply comes online, requests for their encrypted digest (containing only their transactions) from server, then decrypts the digest to find their transactions. With OMR client only needs to download 800Kb of data and perfom single decryption locally (decryption time: ~50ms). Hence, the entire process of retrieving transactions takes only few seconds and vastly improves user experience.
For security and efficiency, OMR processes messages in set of
For users to be able to receive messages as soon as they are sent to server and for an amazing user experience of privacy preserving protocols it is important for runtime of Phase 1 and Phase 2 to be as low as possible. The good news is even though FHE computations are expensive, they are highly parallelizable. For example, as visible in benchmarks, on single thread phase 1 roughly takes 5.2 minutes and phase 2 roughly takes 14 minutes. But increasing threads to 16, reduces phase 1 time to 52.2 seconds and phase 2 time to 58 seconds.
Now if you are wondering whether time reduces linearly with more cores, yes it does. Phase 2 time reduces linearly with more cores. So does Phase 1, but with a caveat of some precomputation required per user that needs to be done only once and stored.
Before running the program there are a few things to be kept in mind
- The code is only optimised to use power of 2 no. of cores. So running it on a machine with non power of 2 cores wouldn't use all cores maximally.
- The library has 3 features,
noise
,level
, andprecomp_pvw
.noise
: Only used for debugging purposes. Not relevant for testing performance.level
: Enables levelled implementation. This should be enabled when testing performance.precomp_pvw
: Combininglevel
withprecomp_pvw
will give the best performance if there are more than 8 no. of cores available. However,precomp_pvw
comes with additonal assumptions mentioned in PVW Precompute.
- The program performs best on x86 machines with AVX512IFMA instruction set.
- OMR has high memory requirements. For example, with 16 cores phase 1 consumes around 20GB and phase 2 consumes around 100GB.
Note To compile you will require protocol buffer compiler version >= 23.4. You can install it from here
First run setup
chmod +x setup.sh
./setup.sh
Basic structure of the command is
cargo run --release --features "[features]" [option] [cores]
-
features
is a string consisting of enabled features -
option
can either be 1,2, or 3 and must be supplied- (1) run demo
- (2) prints detection key size
- (3) prints the precomputed data size required per user for PVW precompute for a given no. of cores.
-
cores
is optional. If supplied it restricts the program to use given no. of cores. Otherwise program defaults to using all available cores.
To run demo with only level
feature without PVW precompute
cargo run --release --features "level" 1
To restrict the demo to use a fixed no. of cores
cargo run --release --features "level" 1 [cores]
To run demo with level
and precomp_pvw
(best performance)
cargo run --release --features "level,precomp_pvw" 1
To print the detection key size
cargo run --release --features "level" 2
To print the precomputed data size required per user for PVW precompute for a given no. of cores
cargo run --release --features "level" 3 [cores]
You can enable PVW precompute by enabling precomp_pvw
feature. Combining level
with precomp_vw
feature provides best performance. However, enabling PVW precompute assumes that you are willing to store some amount of additonal precomputed data per user. This precomputation only needs to be performed once and can be stored throughout user's lifetime.
If PVW precompute is not enabled then phase 1 is bottlenecked by pvw_decrypt
part, which can only use maximum of 4 cores at once. Thus, to scale performance of phase 1 with more no. of cores PVW precompute becomes essential.
You can get an estimate of data required to store per user for PVW precompute for a given number of cores by running:
cargo run --release --features "level" 3 [cores]
All benchmarks were performed on r6i.8xlarge
ec2 instance equipped with 3rd Gen Intel Xeon and 32vcpus (16 physical cores).
Single run of phase 1 and phase 2 processes
Performance with only level
feature:
Cores | Phase 1 time (in ms) | Phase 2 time (in ms) | Total time (in ms) |
---|---|---|---|
1 | 313680 | 844228 | 1157908 |
16 (32vcpus) | 52127 | 58095 | 110222 |
Performance with level
and precomp_pvw
features.
(Note: cores = 1 is omitted since precomp_pvw
only works with >=8 cores)
Cores | Phase 1 time (in ms) | Phase 2 time (in ms) | Total time (in ms) |
---|---|---|---|
16 (32vcpus) | 28202 | 57930 | 84226 |
Storage size for PVW Precompute per user on 16 cores: 47.5 MB
Each user needs to upload detection key to server. The key does not reveal anything. It's size is 163 MB.
The size of message digest that user downloads at the time of request is 800Kb.
Please check
If you want to work togther on OMR (and other FHE related projects) then please check contact and send a dm! Also feel free to join this telegram channel.
If you want run OMR in production, then please check contact and reach out directly!
Telegram: @janmajayamall
Email: janmajaya@caird.xyz
Development of OMR is supported through a grant from Aztec.