Privacy preserving DHTs
gpestana opened this issue · 9 comments
Distributed hash tables are powerful protocols that enable content discovering and routing in P2P networks. In "vanilla" DHT implementations (e.g. Kademlia), peer interactions leak a lot of information about who is in the network, who stores what files and who are the producers and consumers of content. This gives potential attackers a good picture of the network very easily and hinders anonymous and private interactions over P2P DHT overlay networks.
The desirable properties of a private and metadata resistant DHT are:
- Anonymity for producers of content: tracking down who was the originator of content stored in the DHT should not be possible.
- Anonymity for consumers of content: nodes that request content from the DHT should not be linked to the requested content by external actors.
- Plausible deniability of the files hosted in the network nodes: when peers query for content in the DHT, they should not be able to identify which peers are storing the content.
The main goal of this research thread is to survey and develop schemes that would improve DHT protocol's privacy without affecting performance and usability.
Questions
- What is the threat model for DHTs (security and anonymity)?
- What are the solutions out there for private and metadata resistant DHTs?
- What are the attack vectors for current DHT implementations and applications (e.g. IPFS)?
- Are there any schemes and techniques for hiding metadata in large DHTs without relying on external systems (e.g. Tor), while keeping performance?
Enabler questions
- What are the best techniques and tools for measuring DHT performance in the wild?
Development
- PoC of protocol and primitives which improve DHT privacy
literature
NOTE: this list is discontinued. check these and more papers' summaries at https://github.com/gpestana/p2psec/tree/master/research
-
Octopus: A Secure and Anonymous DHT Lookup
design of a DHT lookup mechanism which achieves security and anonymity. it achieves security by using attacker identification mechanisms to identify and remove adversary nodes from the network. it achieves anonymity by separating queries into different paths and adding dummy queries. simulations show that the protocol can achieve near-optimal anonymity.
paper
summary and notes -
k-anonymity Chord for anonymous query response
paper -
Why I'm not an entropist
a case against entropy-based anonymity networks
paper
summary and notes -
Bifrost : A Novel Anonymous Communication System with DHT
paper -
In Search of an Anonymous and Secure Lookup - Attacks on Structured Peer-to-Peer Anonymous Communication Systems
paper -
Information Leak in the Chord Lookup Protocol
paper -
Information Leaks in Structured Peer-to-Peer Anonymous Communication Systems
paper -
Adding Query Privacy to Robust DHTs
this paper presents a solution for hiding who is relaying key messages in the network, which is an approach often neglected in the literature (usually only provider and sender are protected).
paper -
Practical Robust Communication in DHTs Tolerating a Byzantine Adversary
paper -
A Survey on Routing in Anonymous Communication Protocols
survey about previous work on design and development of anonymous communication systems. the survey focus not only on DHTs but also other overlay networks.
paper -
Crawling the DHT for fun and profit
showing how BitTorrent Vuze DHT is vulnerable to metadata attacks. authors are able to monitor and learn who's downloading and what in BitTorrent just by sniffing to DHT interactions. the authors show its easy and cheap it is to create a tracker index and for copy right holders to see map host IPs to copy righted downloads.
paper
summary and notes -
Latency and Routing Efficiency Based Metric for Performance Comparison of DHT Overlay Networks
the general metrics for evaluate DHT performance are latency per successful lookup and percentage of successful lookups. these metrics are not sufficient to consider in separate. this paper presents a metric which incorporate both metrics and models three different timeout mechanisms.
paper
summary and notes -
Sloppy hasing and self organizing clusters
introduces the idea of DSHT - distributed sloppy hash tables - used by Coral CDN. DSHT was developed to achieve better performance and scalability when compared to vanilla-flavoured DHTs. DSHTs lets nodes locate nearby copies of a file without causing hot spots in the indexing infrastructure.
paper -
Pretty Private Group Management
private group management relying on DHTs with proof of concept implemented on top of Vuze DHT.
paper
≈
-
Routing in the Dark: Pitch Black
shows how the Freenetdarknet
routing algorithm for decentralised restricted-routing topologies (e.g. friend-2-friend) can be attacked by low-resourced attackers, which may lead to severe damage of network content availability.
paper, dissertation
summary and notes -
Perserving context privacy in DHT wireless sensor networks
paper -
ShadowWalker: peer-to-peer anonymous communication using redundant structured topologies
ShadowWalker is a low-latency P2P anonymous communication system, based on a random walk over a redundant structured topology
paper
summary and notes -
Mix-networks with Restricted Routes
studies mix networks in which each peer knows and communicates with only a restricted amount of peers, while studying the network's anonymity properties
paper -
TARANET: Traffic-Analysis Resistant Anonymity at the NETwork layer
presents TARANET which implements a traffic analysis resistant, anonymous and performant system implemented at the network layer
paper -
Scalable onion routing with torsk
paper -
Systematising Decentralisation and Privacy: Lessons from 15 years of Research
this work summarises the state of art of decentralisation technology and privacy. It draws conclusions from lessons learned from previous research and implementation efforts and outlines what are the current challenges and open research questions.
paper
summary and notes -
Peer-to-Peer Private Information Retrieval
paper -
The Devil is in the Metadata –New Privacy Challenges in Decentralised Online Social Networks
analysis of metadata leaked by decentralised and open social networks and comparison with centralised systems, in which a central authority relays all content.
paper
Privacy preserving DHT use cases in the wild
-
Fully private DHT on IPFS
(https://github.com/libp2p/developer-meetings, issue#6)
Private content routing based on 1) zero-knowledge bitswap queries and 2) assigning ephemeral IDs to lookup initiators. -
DHT support on ZeroNet
(https://github.com/HelloZeroNet/ZeroNet, issue#57) -
Hyperswam
(https://github.com/hyperswarm/dht) -
Tor directory server
-
Lightening Network
(https://github.com/lightningnetwork/lnd)
LN uses P2P onion routing to preserve sender and receiver anonymity in the network. What's the relay directory mechanism? Is it DHT-based and private? -
Censorship.no
(https://censorship.no/) -
Perfect Dark
(https://en.wikipedia.org/wiki/Perfect_Dark_(P2P))
TODO: check how Perfect Dark used DHT as an anonymous P2P network
Privacy and security in DHTs
Anonymity of lookup initiators and targets in DHTs follows and requires DHT security [a]. Security in DHTs is defined by the ability of malicious nodes to bias lookup requests. Being able to identify and ignore/kick out malicious nodes is important, in order for malicious nodes not to have the opportunity to attack other nodes by 1) poisoning routing tables and 2) answering to lookup requests with other malicious nodes.
Threat model
notes on DHT threat model (for anonymity)
Privacy and security layering
There are two layers of security in DHT networks (and any other network/centralised system). The different layers have different privacy and security attack surface, targets and types of adversaries:
-
service level: service level attack surface targets high level node interactions in DHTs - information about how nodes are participating in the networks (which lookups are requesting/replying for, what data is being stored, etc). If there are no mechanism in place to protect from service level attacks, it is relatively easy (in terms of resources and skills) to attack the network and obtain information from the nodes' interactions that must remain private.
-
network level: network level attack surface targets low level interactions between nodes, at a TCP/UDP level. Adversaries with large amounts of resources can obtain enough information by sniffing all connections in the networks and map/correlate nodes and their interactions. Protection mechanisms for network level attacks are mix networks, Tor, i2p, etc.
[a] Octopus: A Secure and Anonymous DHT Lookup
Anonymous neighbour surveillance and decentralised reputation system to prevent bias and routing table pollution attacks on DHTs
Reputation systems can be used as a mechanism to identify and signal malicious nodes in the network. A malicious node can manipulate the routing table before serving other nodes which ask for their routing table.
Lookup mechanism
In this setup, a lookup initiator selects from his routing table the node which ID is closest to the target ID and requests its full routing table. From the received routing table, the initiator selects again the node which is closest to the target ID and requests for its full routing table. This process keeps going recursivel until the target ID (or closest to it) is reached. Note that the initiator requests the full table in order to conceal which ID it is looking up.
Routing table pollution and bias attack
A wide category of attacks in this setup can be summarized as malicious nodes biasing lookup initiators by providing manipulated routing tables. Such attacks, according to [a] are Lookup Bias Attack, Lookup Misdirection Attack and Finger Pollution Attack. These attacks will be called as simply lookup bias attacks from now on.
Detecting lookup bias attacks, nodes keep a predessessor table which contains all the nodes that contains itself in their routing tables. With this table, node regularly anonymously requests the routing table of all nodes in its predecessor table. If their own IP is not in the table, then it can be assumed that the predecessor node is lying about the routing table and performing lookup bias attacks.
The lookup must be anonymous in order to ensure that the predecessor node cannot recognise whether it is a successor node requesting its routing table. Otherwise, the malicious node could respond accordingly.
Reputation system
When a node detects a lookup bias attack from a malicious node, it needs to let other nodes know that the node was lying. If any node can prove that 1) a biased routing table was served by a particular malicious node and 2) it should be in the routing table, the that information can be spread in the network for other nodes to recognise the malicious node as an attacker.
For point 1), the algorithm can enforce that a routing table must be hashed and signed by the server node, so no node can tamper the results routing table contents and everyone can verify who generated and served the routing table (integrity and non-repudiation properties).
For 2) Question: how can a node prove that a given routing table is manipulated (e.g. by showing that its ID should be part of the routing table, but it is not)?
Metrics for evaluating reputation system:
The metrics to evaluate the security mechanisms intend to understand how successful nodes in the network are at flagging malicious nodes. Metrics are 1) fraction of malicious nodes in the network; 2) false positive rate (honest node is a malicious node); 3) false alarm rate.
[a] Octopus: A Secure and Anonymous DHT Lookup
Onion routing in DHTs
- How could onion routing solve privacy problems in DHTs? (in-DHT onion routing)
- Overhead/efficiency (PK, circuit building)
- Attack surface (malicious relay nodes dropping messages, maybe construct multiple paths/circuits?
- Is there any example of onion routing in DHTs in the wild? If not, why not?
- The biggest attack surface to de-anonymise onion routing based DHTs is when the circuit is built. Some ways to prevent anonymising of onion circuit building could be:
- P2P PIR which maintain peers information
- shadowwalker (requires secure DHT)
- mixnet for building the circuit (but why? the problem is secure lookups - when nodes lie)
- bootstrap many times until create enough entropy and know the network well enough to choose the circuit nodes
in-DHT Onion routing circuit construction
One possible vulnerability when using in-DHT onion routing is to construct the onion routing so that adversaries cannot learn what are the nodes selected as onion routers - or put it simply, which nodes are part of the onion routing circuit.
Ideas for securing onion circuit construction:
- Passive bootstrapping: By passively listening to network traffic and perform random lookups, the node can gather enough information about the current state of the network to select X number of nodes to construct the onion circuit. This strategy may open possibility for targeted attacksL: an attacker can target a specific bootstrapping node so that it is part of the circuit. But would the attacker be able to know it?
- Group entropy for hiding lookup initiator identity (see comment below): group entropy could be used to create noise while nodes are listening to the networks and passively building the onion circuit.
- ShadowWalker [1]
Optimal circuit length
since the probability of a malicious node to be selected for the circuit increases with the length of the circuit itself, it is more secure to chose the least amount of nodes possible that keep anonymity in onion routing. "less onion relays means less attack surface".
To read:
Threat model
- End to end timing attacks possible iff first and last hop are malicious and working together [1, +?]
[1] Scalable Anonymous Communication with Provable Security
Group entropy for hiding lookup initiator identity [idea]
In order to create the entropy necessary to hide the lookup initiator in DHTs - similar to mixnets approach - build new groups of nodes which gossip lookup requests and responses within each other. Each node is responsible to forward the requests to other nodes outside the group and gossip back to the group. The lookup initiator will keep looking for when the final responses are returned and pick the best.
DHT group entropy for hiding lookup initiator: A lookup scheme to hide lookup initiator though gossip in small node groups. Conceptually, the idea is that small groups of nodes (maybe 50?) would communicate with each other through gossip, creating a self contained mix network. The message entropy within the group would hide who in the group initiated a particular request. So a node instead of communicate directly with peers in its finger table, it would first gossip the request to its group and all the group members would forward the request based on their finger tables. The lookup replies would also be gossiped within the group, so that the lookup initiator could pick the best responses.
properties
- adds entropy to lookup initiator
- borrows from MPC security: lookup initiator will pick the best/same responses from a set of untrustworthy parties
open questions
- best approach to create and maintain an overlay group?
- optimal group size vs overhead (or entropy vs performance)
- is a gossiping really the key?
- what if group nodes miss behave and do not gossip/forward? - there is incentives for all nodes to create as much entropy as possible in order for them to also have anonymity lookups.
This is a similar approach used by the anonymity system Crowds, which uses the concept of "blending into the crowd" (https://dl.acm.org/citation.cfm?id=290168)
DHT protocol routing mechanisms
Each DHT protocol implements different routing mechanisms. Describe how different DHT protocols perform routing and how are they vulnerable to metadata attacks.
- Kademlia
- CAN
- Chord
- Pastry
- Tapestry
Measuring anonymity / privacy in P2P networks
Measuring anonymity of systems is important to assess, compare and design anonymity of p2p systems. This thread contains research summaries, ideas and notes on how to measure anonymity/privacy in P2P systems and how to apply those measures to production systems.
[moved] check the notes and paper summaries at gpestana/p2psec#3
I had an idea for at least a slight improvement in DHT privacy, and was curious if you've encountered anything similar.
I was thinking about it in the context of BitTorrent, where it would be nice to make passive surveillance harder. The basic idea is to use public keys as identifiers, instead of hashes, and require, in the protocol, that nodes respond to a challenge where they must produce a signature with the corresponding private key in order to receive responses to their queries.
For a BitTorrent use-case, this would look like:
- Use the torrent infohash as a private key, and derive a corresponding public key
- Make DHT requests as usual, and respond to challenges using the private key
This would make passive surveillance harder. In particular, it would make indexing the DHT just by watching queries impossible. An attacker would be able to see pubkeys, but these pubkeys would be opaque to him, unless he knew the underlying infohashes. The attacker could not, upon getting a request for a key, make his own requests for the same key, since he would be unable to produce signatures in response to challenges.