A hypothetical design of a zero-knowledge contact discovery service (CDS).
The top-level goal is for clients to map their address books of phone numbers to user IDs (or other equivalent tokens), while preserving the following properties:
- The server only learns a phone number is in a client's address book when and if the client refers to their user ID.
- The client can only refer to a user ID iff they know the phone number for that user ID.
- The client cannot make offline guesses about phone numbers.
A server is initialized with a secret scalar, d_S
, and a map of phone numbers to user IDs. The
server creates an internal mapping of blinded data.
For each (p, u)
pair, the server performs the following steps:
- Hashes the phone number to a point on the curve:
P = hash2curve(p)
- Calculates the server-blinded phone number point:
sP = [d_S]P
- Encodes the user ID via a bijective mapping (e.g. try-and-increment) as a point on the curve:
U = encode2curve(p)
- Hashes the phone number:
h = SHA256(p)
- Calculates the double-blinded phone number point:
psU = [h*d_S]U
The server then groups pairs of (sP, hsU)
values by an N
-bit prefix of h
.
A client is initialized with a secret scalar, d_C
and a set of phone numbers {p_0…p_N}
.
To map a phone number p
to a user ID u
, the client performs the following steps:
- Hashes the phone number:
h = SHA256(p)
- Hashes the phone number to a point on the curve:
P = hash2curve(p)
- Calculates the client-blinded phone number point:
cP = [d_C]P
The client sends an N
-bit prefix of h
and cP
to the server (not necessarily in the same
request). The prefix of h
can serve as an N
-bit distinguisher of p
, so N
should be chosen to
ensure a collision rate which preserves the privacy of p
. cP
cannot serve as a distinguisher of
p
without knowing d_C
. Consequently, the server learns no information about p
from this
request.
The server calculates the double-blinded point scP = [d_S]cP
and returns it along with a set of
(sP, hsU)
pairs with the corresponding prefix of h
. Without d_S
, the client cannot calculate
P
or U
values, and cannot learn anything about the original p
or u
values. At most, the
client learns the cardinality of the response set and can estimate the size of the server's full
address book.
The client calculates the server-blinded point sP = [(1/d_C)]scP
and searches for it in the set
of pairs. If a pair exists with sP
, the client calculates the server-blinded user ID point
sU = [(1/h)]hsU
. The client sends sU
to the server, who fully unblinds it (U = [(1/d_S)]sU
)
and decodes the point into a user ID u
(e.g. in order to connect the two users and i.e. without
returning u
to the client). The use of sU
as a proxy for u
allows the server to keep u
values entirely private (if desired), and the server only learns of the client's knowledge of p
when and if the client attempts to use that client.
Finally, an attacker attempting to discover whether the server has specific p
values will be
unable to do so without making online requests and can thus be rate-limited.
Assuming a user base of 100M users, a 15-bit hash prefix would yield buckets of around 3K phone numbers, with a total response size of 197KiB. A 12-bit hash prefix would yield buckets of around 25K phone numbers, with a total response size of 1.54MiB.
Copyright © 2023 Coda Hale