Privacy preserving DHTs

Question

Privacy preserving DHTs

gpestana opened this issue 6 years ago · 9 comments

Distributed hash tables are powerful protocols that enable content discovering and routing in P2P networks. In "vanilla" DHT implementations (e.g. Kademlia), peer interactions leak a lot of information about who is in the network, who stores what files and who are the producers and consumers of content. This gives potential attackers a good picture of the network very easily and hinders anonymous and private interactions over P2P DHT overlay networks.

The desirable properties of a private and metadata resistant DHT are:

Anonymity for producers of content: tracking down who was the originator of content stored in the DHT should not be possible.
Anonymity for consumers of content: nodes that request content from the DHT should not be linked to the requested content by external actors.
Plausible deniability of the files hosted in the network nodes: when peers query for content in the DHT, they should not be able to identify which peers are storing the content.

The main goal of this research thread is to survey and develop schemes that would improve DHT protocol's privacy without affecting performance and usability.

Questions

What is the threat model for DHTs (security and anonymity)?
What are the solutions out there for private and metadata resistant DHTs?
What are the attack vectors for current DHT implementations and applications (e.g. IPFS)?
Are there any schemes and techniques for hiding metadata in large DHTs without relying on external systems (e.g. Tor), while keeping performance?

Enabler questions

What are the best techniques and tools for measuring DHT performance in the wild?

Development

PoC of protocol and primitives which improve DHT privacy

Answer 1 · 2018-08-03T11:49:46.000Z

literature

NOTE: this list is discontinued. check these and more papers' summaries at https://github.com/gpestana/p2psec/tree/master/research

≈

Routing in the Dark: Pitch Black
shows how the Freenet darknet routing algorithm for decentralised restricted-routing topologies (e.g. friend-2-friend) can be attacked by low-resourced attackers, which may lead to severe damage of network content availability.
paper, dissertation
summary and notes
Perserving context privacy in DHT wireless sensor networks
paper
ShadowWalker: peer-to-peer anonymous communication using redundant structured topologies
ShadowWalker is a low-latency P2P anonymous communication system, based on a random walk over a redundant structured topology
paper
summary and notes
Mix-networks with Restricted Routes
studies mix networks in which each peer knows and communicates with only a restricted amount of peers, while studying the network's anonymity properties
paper
TARANET: Traffic-Analysis Resistant Anonymity at the NETwork layer
presents TARANET which implements a traffic analysis resistant, anonymous and performant system implemented at the network layer
paper
Scalable onion routing with torsk
paper
Systematising Decentralisation and Privacy: Lessons from 15 years of Research
this work summarises the state of art of decentralisation technology and privacy. It draws conclusions from lessons learned from previous research and implementation efforts and outlines what are the current challenges and open research questions.
paper
summary and notes
Peer-to-Peer Private Information Retrieval
paper
The Devil is in the Metadata –New Privacy Challenges in Decentralised Online Social Networks
analysis of metadata leaked by decentralised and open social networks and comparison with centralised systems, in which a central authority relays all content.
paper

Answer 2 · 2018-08-03T11:49:54.000Z

Privacy preserving DHT use cases in the wild

Fully private DHT on IPFS
(https://github.com/libp2p/developer-meetings, issue#6)
Private content routing based on 1) zero-knowledge bitswap queries and 2) assigning ephemeral IDs to lookup initiators.
DHT support on ZeroNet
(https://github.com/HelloZeroNet/ZeroNet, issue#57)
Hyperswam
(https://github.com/hyperswarm/dht)
Tor directory server
Lightening Network
(https://github.com/lightningnetwork/lnd)
LN uses P2P onion routing to preserve sender and receiver anonymity in the network. What's the relay directory mechanism? Is it DHT-based and private?
Censorship.no
(https://censorship.no/)
Perfect Dark
(https://en.wikipedia.org/wiki/Perfect_Dark_(P2P))
TODO: check how Perfect Dark used DHT as an anonymous P2P network

Answer 3 · 2018-09-25T22:28:01.000Z

Privacy and security in DHTs

Anonymity of lookup initiators and targets in DHTs follows and requires DHT security [a]. Security in DHTs is defined by the ability of malicious nodes to bias lookup requests. Being able to identify and ignore/kick out malicious nodes is important, in order for malicious nodes not to have the opportunity to attack other nodes by 1) poisoning routing tables and 2) answering to lookup requests with other malicious nodes.

Threat model

notes on DHT threat model (for anonymity)

Privacy and security layering

There are two layers of security in DHT networks (and any other network/centralised system). The different layers have different privacy and security attack surface, targets and types of adversaries:

service level: service level attack surface targets high level node interactions in DHTs - information about how nodes are participating in the networks (which lookups are requesting/replying for, what data is being stored, etc). If there are no mechanism in place to protect from service level attacks, it is relatively easy (in terms of resources and skills) to attack the network and obtain information from the nodes' interactions that must remain private.
network level: network level attack surface targets low level interactions between nodes, at a TCP/UDP level. Adversaries with large amounts of resources can obtain enough information by sniffing all connections in the networks and map/correlate nodes and their interactions. Protection mechanisms for network level attacks are mix networks, Tor, i2p, etc.

[a] Octopus: A Secure and Anonymous DHT Lookup

Answer 4 · 2018-09-26T21:45:34.000Z

Anonymous neighbour surveillance and decentralised reputation system to prevent bias and routing table pollution attacks on DHTs

Reputation systems can be used as a mechanism to identify and signal malicious nodes in the network. A malicious node can manipulate the routing table before serving other nodes which ask for their routing table.

Lookup mechanism
In this setup, a lookup initiator selects from his routing table the node which ID is closest to the target ID and requests its full routing table. From the received routing table, the initiator selects again the node which is closest to the target ID and requests for its full routing table. This process keeps going recursivel until the target ID (or closest to it) is reached. Note that the initiator requests the full table in order to conceal which ID it is looking up.

Routing table pollution and bias attack
A wide category of attacks in this setup can be summarized as malicious nodes biasing lookup initiators by providing manipulated routing tables. Such attacks, according to [a] are Lookup Bias Attack, Lookup Misdirection Attack and Finger Pollution Attack. These attacks will be called as simply lookup bias attacks from now on.

Detecting lookup bias attacks, nodes keep a predessessor table which contains all the nodes that contains itself in their routing tables. With this table, node regularly anonymously requests the routing table of all nodes in its predecessor table. If their own IP is not in the table, then it can be assumed that the predecessor node is lying about the routing table and performing lookup bias attacks.

The lookup must be anonymous in order to ensure that the predecessor node cannot recognise whether it is a successor node requesting its routing table. Otherwise, the malicious node could respond accordingly.

Reputation system
When a node detects a lookup bias attack from a malicious node, it needs to let other nodes know that the node was lying. If any node can prove that 1) a biased routing table was served by a particular malicious node and 2) it should be in the routing table, the that information can be spread in the network for other nodes to recognise the malicious node as an attacker.

For point 1), the algorithm can enforce that a routing table must be hashed and signed by the server node, so no node can tamper the results routing table contents and everyone can verify who generated and served the routing table (integrity and non-repudiation properties).

For 2) Question: how can a node prove that a given routing table is manipulated (e.g. by showing that its ID should be part of the routing table, but it is not)?

Metrics for evaluating reputation system:
The metrics to evaluate the security mechanisms intend to understand how successful nodes in the network are at flagging malicious nodes. Metrics are 1) fraction of malicious nodes in the network; 2) false positive rate (honest node is a malicious node); 3) false alarm rate.

[a] Octopus: A Secure and Anonymous DHT Lookup

Answer 5 · 2018-10-06T06:13:12.000Z

Onion routing in DHTs

How could onion routing solve privacy problems in DHTs? (in-DHT onion routing)
Overhead/efficiency (PK, circuit building)
Attack surface (malicious relay nodes dropping messages, maybe construct multiple paths/circuits?
Is there any example of onion routing in DHTs in the wild? If not, why not?
The biggest attack surface to de-anonymise onion routing based DHTs is when the circuit is built. Some ways to prevent anonymising of onion circuit building could be:
- P2P PIR which maintain peers information
- shadowwalker (requires secure DHT)
- mixnet for building the circuit (but why? the problem is secure lookups - when nodes lie)
- bootstrap many times until create enough entropy and know the network well enough to choose the circuit nodes

in-DHT Onion routing circuit construction

One possible vulnerability when using in-DHT onion routing is to construct the onion routing so that adversaries cannot learn what are the nodes selected as onion routers - or put it simply, which nodes are part of the onion routing circuit.

Ideas for securing onion circuit construction:

Passive bootstrapping: By passively listening to network traffic and perform random lookups, the node can gather enough information about the current state of the network to select X number of nodes to construct the onion circuit. This strategy may open possibility for targeted attacksL: an attacker can target a specific bootstrapping node so that it is part of the circuit. But would the attacker be able to know it?
Group entropy for hiding lookup initiator identity (see comment below): group entropy could be used to create noise while nodes are listening to the networks and passively building the onion circuit.
ShadowWalker [1]

Optimal circuit length

since the probability of a malicious node to be selected for the circuit increases with the length of the circuit itself, it is more secure to chose the least amount of nodes possible that keep anonymity in onion routing. "less onion relays means less attack surface".

To read:

On the Optimal Path Length for Tor

Threat model

End to end timing attacks possible iff first and last hop are malicious and working together [1, +?]

[1] Scalable Anonymous Communication with Provable Security

Answer 6 · 2018-10-31T10:10:17.000Z

Group entropy for hiding lookup initiator identity [idea]

In order to create the entropy necessary to hide the lookup initiator in DHTs - similar to mixnets approach - build new groups of nodes which gossip lookup requests and responses within each other. Each node is responsible to forward the requests to other nodes outside the group and gossip back to the group. The lookup initiator will keep looking for when the final responses are returned and pick the best.

DHT group entropy for hiding lookup initiator: A lookup scheme to hide lookup initiator though gossip in small node groups. Conceptually, the idea is that small groups of nodes (maybe 50?) would communicate with each other through gossip, creating a self contained mix network. The message entropy within the group would hide who in the group initiated a particular request. So a node instead of communicate directly with peers in its finger table, it would first gossip the request to its group and all the group members would forward the request based on their finger tables. The lookup replies would also be gossiped within the group, so that the lookup initiator could pick the best responses.

properties

adds entropy to lookup initiator
borrows from MPC security: lookup initiator will pick the best/same responses from a set of untrustworthy parties

open questions

best approach to create and maintain an overlay group?
optimal group size vs overhead (or entropy vs performance)
is a gossiping really the key?
what if group nodes miss behave and do not gossip/forward? - there is incentives for all nodes to create as much entropy as possible in order for them to also have anonymity lookups.

reddit hackernews

This is a similar approach used by the anonymity system Crowds, which uses the concept of "blending into the crowd" (https://dl.acm.org/citation.cfm?id=290168)

Answer 7 · 2018-11-07T16:38:08.000Z

DHT protocol routing mechanisms

Each DHT protocol implements different routing mechanisms. Describe how different DHT protocols perform routing and how are they vulnerable to metadata attacks.

Kademlia
CAN
Chord
Pastry
Tapestry

Answer 8 · 2019-01-09T10:32:02.000Z

Measuring anonymity / privacy in P2P networks

Measuring anonymity of systems is important to assess, compare and design anonymity of p2p systems. This thread contains research summaries, ideas and notes on how to measure anonymity/privacy in P2P systems and how to apply those measures to production systems.

[moved] check the notes and paper summaries at gpestana/p2psec#3

Answer 9 · 2022-08-16T22:08:29.000Z

I had an idea for at least a slight improvement in DHT privacy, and was curious if you've encountered anything similar.

I was thinking about it in the context of BitTorrent, where it would be nice to make passive surveillance harder. The basic idea is to use public keys as identifiers, instead of hashes, and require, in the protocol, that nodes respond to a challenge where they must produce a signature with the corresponding private key in order to receive responses to their queries.

For a BitTorrent use-case, this would look like:

Use the torrent infohash as a private key, and derive a corresponding public key
Make DHT requests as usual, and respond to challenges using the private key

This would make passive surveillance harder. In particular, it would make indexing the DHT just by watching queries impossible. An attacker would be able to see pubkeys, but these pubkeys would be opaque to him, unless he knew the underlying infohashes. The attacker could not, upon getting a request for a key, make his own requests for the same key, since he would be unable to produce signatures in response to challenges.