gpestana/notes

Privacy preserving DHTs

gpestana opened this issue · 9 comments

Distributed hash tables are powerful protocols that enable content discovering and routing in P2P networks. In "vanilla" DHT implementations (e.g. Kademlia), peer interactions leak a lot of information about who is in the network, who stores what files and who are the producers and consumers of content. This gives potential attackers a good picture of the network very easily and hinders anonymous and private interactions over P2P DHT overlay networks.

The desirable properties of a private and metadata resistant DHT are:

  1. Anonymity for producers of content: tracking down who was the originator of content stored in the DHT should not be possible.
  2. Anonymity for consumers of content: nodes that request content from the DHT should not be linked to the requested content by external actors.
  3. Plausible deniability of the files hosted in the network nodes: when peers query for content in the DHT, they should not be able to identify which peers are storing the content.

The main goal of this research thread is to survey and develop schemes that would improve DHT protocol's privacy without affecting performance and usability.

Questions

  • What is the threat model for DHTs (security and anonymity)?
  • What are the solutions out there for private and metadata resistant DHTs?
  • What are the attack vectors for current DHT implementations and applications (e.g. IPFS)?
  • Are there any schemes and techniques for hiding metadata in large DHTs without relying on external systems (e.g. Tor), while keeping performance?

Enabler questions

  • What are the best techniques and tools for measuring DHT performance in the wild?

Development

  • PoC of protocol and primitives which improve DHT privacy

literature

NOTE: this list is discontinued. check these and more papers' summaries at https://github.com/gpestana/p2psec/tree/master/research


  • Octopus: A Secure and Anonymous DHT Lookup
    design of a DHT lookup mechanism which achieves security and anonymity. it achieves security by using attacker identification mechanisms to identify and remove adversary nodes from the network. it achieves anonymity by separating queries into different paths and adding dummy queries. simulations show that the protocol can achieve near-optimal anonymity.
    paper
    summary and notes

  • k-anonymity Chord for anonymous query response
    paper

  • Why I'm not an entropist
    a case against entropy-based anonymity networks
    paper
    summary and notes

  • Bifrost : A Novel Anonymous Communication System with DHT
    paper

  • In Search of an Anonymous and Secure Lookup - Attacks on Structured Peer-to-Peer Anonymous Communication Systems
    paper

  • Information Leak in the Chord Lookup Protocol
    paper

  • Information Leaks in Structured Peer-to-Peer Anonymous Communication Systems
    paper

  • Adding Query Privacy to Robust DHTs
    this paper presents a solution for hiding who is relaying key messages in the network, which is an approach often neglected in the literature (usually only provider and sender are protected).
    paper

  • Practical Robust Communication in DHTs Tolerating a Byzantine Adversary
    paper

  • A Survey on Routing in Anonymous Communication Protocols
    survey about previous work on design and development of anonymous communication systems. the survey focus not only on DHTs but also other overlay networks.
    paper

  • Crawling the DHT for fun and profit
    showing how BitTorrent Vuze DHT is vulnerable to metadata attacks. authors are able to monitor and learn who's downloading and what in BitTorrent just by sniffing to DHT interactions. the authors show its easy and cheap it is to create a tracker index and for copy right holders to see map host IPs to copy righted downloads.
    paper
    summary and notes

  • Latency and Routing Efficiency Based Metric for Performance Comparison of DHT Overlay Networks
    the general metrics for evaluate DHT performance are latency per successful lookup and percentage of successful lookups. these metrics are not sufficient to consider in separate. this paper presents a metric which incorporate both metrics and models three different timeout mechanisms.
    paper
    summary and notes

  • Sloppy hasing and self organizing clusters
    introduces the idea of DSHT - distributed sloppy hash tables - used by Coral CDN. DSHT was developed to achieve better performance and scalability when compared to vanilla-flavoured DHTs. DSHTs lets nodes locate nearby copies of a file without causing hot spots in the indexing infrastructure.
    paper

  • Pretty Private Group Management
    private group management relying on DHTs with proof of concept implemented on top of Vuze DHT.
    paper

  • Routing in the Dark: Pitch Black
    shows how the Freenet darknet routing algorithm for decentralised restricted-routing topologies (e.g. friend-2-friend) can be attacked by low-resourced attackers, which may lead to severe damage of network content availability.
    paper, dissertation
    summary and notes

  • Perserving context privacy in DHT wireless sensor networks
    paper

  • ShadowWalker: peer-to-peer anonymous communication using redundant structured topologies
    ShadowWalker is a low-latency P2P anonymous communication system, based on a random walk over a redundant structured topology
    paper
    summary and notes

  • Mix-networks with Restricted Routes
    studies mix networks in which each peer knows and communicates with only a restricted amount of peers, while studying the network's anonymity properties
    paper

  • TARANET: Traffic-Analysis Resistant Anonymity at the NETwork layer
    presents TARANET which implements a traffic analysis resistant, anonymous and performant system implemented at the network layer
    paper

  • Scalable onion routing with torsk
    paper

  • Systematising Decentralisation and Privacy: Lessons from 15 years of Research
    this work summarises the state of art of decentralisation technology and privacy. It draws conclusions from lessons learned from previous research and implementation efforts and outlines what are the current challenges and open research questions.
    paper
    summary and notes

  • Peer-to-Peer Private Information Retrieval
    paper

  • The Devil is in the Metadata –New Privacy Challenges in Decentralised Online Social Networks
    analysis of metadata leaked by decentralised and open social networks and comparison with centralised systems, in which a central authority relays all content.
    paper

Privacy preserving DHT use cases in the wild

Privacy and security in DHTs

Anonymity of lookup initiators and targets in DHTs follows and requires DHT security [a]. Security in DHTs is defined by the ability of malicious nodes to bias lookup requests. Being able to identify and ignore/kick out malicious nodes is important, in order for malicious nodes not to have the opportunity to attack other nodes by 1) poisoning routing tables and 2) answering to lookup requests with other malicious nodes.

Threat model

notes on DHT threat model (for anonymity)

Privacy and security layering

There are two layers of security in DHT networks (and any other network/centralised system). The different layers have different privacy and security attack surface, targets and types of adversaries:

  1. service level: service level attack surface targets high level node interactions in DHTs - information about how nodes are participating in the networks (which lookups are requesting/replying for, what data is being stored, etc). If there are no mechanism in place to protect from service level attacks, it is relatively easy (in terms of resources and skills) to attack the network and obtain information from the nodes' interactions that must remain private.

  2. network level: network level attack surface targets low level interactions between nodes, at a TCP/UDP level. Adversaries with large amounts of resources can obtain enough information by sniffing all connections in the networks and map/correlate nodes and their interactions. Protection mechanisms for network level attacks are mix networks, Tor, i2p, etc.


[a] Octopus: A Secure and Anonymous DHT Lookup

Anonymous neighbour surveillance and decentralised reputation system to prevent bias and routing table pollution attacks on DHTs

Reputation systems can be used as a mechanism to identify and signal malicious nodes in the network. A malicious node can manipulate the routing table before serving other nodes which ask for their routing table.

Lookup mechanism
In this setup, a lookup initiator selects from his routing table the node which ID is closest to the target ID and requests its full routing table. From the received routing table, the initiator selects again the node which is closest to the target ID and requests for its full routing table. This process keeps going recursivel until the target ID (or closest to it) is reached. Note that the initiator requests the full table in order to conceal which ID it is looking up.

Routing table pollution and bias attack
A wide category of attacks in this setup can be summarized as malicious nodes biasing lookup initiators by providing manipulated routing tables. Such attacks, according to [a] are Lookup Bias Attack, Lookup Misdirection Attack and Finger Pollution Attack. These attacks will be called as simply lookup bias attacks from now on.

Detecting lookup bias attacks, nodes keep a predessessor table which contains all the nodes that contains itself in their routing tables. With this table, node regularly anonymously requests the routing table of all nodes in its predecessor table. If their own IP is not in the table, then it can be assumed that the predecessor node is lying about the routing table and performing lookup bias attacks.

The lookup must be anonymous in order to ensure that the predecessor node cannot recognise whether it is a successor node requesting its routing table. Otherwise, the malicious node could respond accordingly.

Reputation system
When a node detects a lookup bias attack from a malicious node, it needs to let other nodes know that the node was lying. If any node can prove that 1) a biased routing table was served by a particular malicious node and 2) it should be in the routing table, the that information can be spread in the network for other nodes to recognise the malicious node as an attacker.

For point 1), the algorithm can enforce that a routing table must be hashed and signed by the server node, so no node can tamper the results routing table contents and everyone can verify who generated and served the routing table (integrity and non-repudiation properties).

For 2) Question: how can a node prove that a given routing table is manipulated (e.g. by showing that its ID should be part of the routing table, but it is not)?

Metrics for evaluating reputation system:
The metrics to evaluate the security mechanisms intend to understand how successful nodes in the network are at flagging malicious nodes. Metrics are 1) fraction of malicious nodes in the network; 2) false positive rate (honest node is a malicious node); 3) false alarm rate.


[a] Octopus: A Secure and Anonymous DHT Lookup

Onion routing in DHTs

  • How could onion routing solve privacy problems in DHTs? (in-DHT onion routing)
  • Overhead/efficiency (PK, circuit building)
  • Attack surface (malicious relay nodes dropping messages, maybe construct multiple paths/circuits?
  • Is there any example of onion routing in DHTs in the wild? If not, why not?
  • The biggest attack surface to de-anonymise onion routing based DHTs is when the circuit is built. Some ways to prevent anonymising of onion circuit building could be:
    • P2P PIR which maintain peers information
    • shadowwalker (requires secure DHT)
    • mixnet for building the circuit (but why? the problem is secure lookups - when nodes lie)
    • bootstrap many times until create enough entropy and know the network well enough to choose the circuit nodes

in-DHT Onion routing circuit construction

One possible vulnerability when using in-DHT onion routing is to construct the onion routing so that adversaries cannot learn what are the nodes selected as onion routers - or put it simply, which nodes are part of the onion routing circuit.

Ideas for securing onion circuit construction:

  • Passive bootstrapping: By passively listening to network traffic and perform random lookups, the node can gather enough information about the current state of the network to select X number of nodes to construct the onion circuit. This strategy may open possibility for targeted attacksL: an attacker can target a specific bootstrapping node so that it is part of the circuit. But would the attacker be able to know it?
  • Group entropy for hiding lookup initiator identity (see comment below): group entropy could be used to create noise while nodes are listening to the networks and passively building the onion circuit.
  • ShadowWalker [1]

Optimal circuit length

since the probability of a malicious node to be selected for the circuit increases with the length of the circuit itself, it is more secure to chose the least amount of nodes possible that keep anonymity in onion routing. "less onion relays means less attack surface".

To read:

Threat model

  • End to end timing attacks possible iff first and last hop are malicious and working together [1, +?]

[1] Scalable Anonymous Communication with Provable Security

Group entropy for hiding lookup initiator identity [idea]

In order to create the entropy necessary to hide the lookup initiator in DHTs - similar to mixnets approach - build new groups of nodes which gossip lookup requests and responses within each other. Each node is responsible to forward the requests to other nodes outside the group and gossip back to the group. The lookup initiator will keep looking for when the final responses are returned and pick the best.

DHT group entropy for hiding lookup initiator: A lookup scheme to hide lookup initiator though gossip in small node groups. Conceptually, the idea is that small groups of nodes (maybe 50?) would communicate with each other through gossip, creating a self contained mix network. The message entropy within the group would hide who in the group initiated a particular request. So a node instead of communicate directly with peers in its finger table, it would first gossip the request to its group and all the group members would forward the request based on their finger tables. The lookup replies would also be gossiped within the group, so that the lookup initiator could pick the best responses.

properties

  • adds entropy to lookup initiator
  • borrows from MPC security: lookup initiator will pick the best/same responses from a set of untrustworthy parties

open questions

  • best approach to create and maintain an overlay group?
  • optimal group size vs overhead (or entropy vs performance)
  • is a gossiping really the key?
  • what if group nodes miss behave and do not gossip/forward? - there is incentives for all nodes to create as much entropy as possible in order for them to also have anonymity lookups.

reddit hackernews


This is a similar approach used by the anonymity system Crowds, which uses the concept of "blending into the crowd" (https://dl.acm.org/citation.cfm?id=290168)

DHT protocol routing mechanisms

Each DHT protocol implements different routing mechanisms. Describe how different DHT protocols perform routing and how are they vulnerable to metadata attacks.

  • Kademlia
  • CAN
  • Chord
  • Pastry
  • Tapestry

Measuring anonymity / privacy in P2P networks

Measuring anonymity of systems is important to assess, compare and design anonymity of p2p systems. This thread contains research summaries, ideas and notes on how to measure anonymity/privacy in P2P systems and how to apply those measures to production systems.

[moved] check the notes and paper summaries at gpestana/p2psec#3

casey commented

I had an idea for at least a slight improvement in DHT privacy, and was curious if you've encountered anything similar.

I was thinking about it in the context of BitTorrent, where it would be nice to make passive surveillance harder. The basic idea is to use public keys as identifiers, instead of hashes, and require, in the protocol, that nodes respond to a challenge where they must produce a signature with the corresponding private key in order to receive responses to their queries.

For a BitTorrent use-case, this would look like:

  • Use the torrent infohash as a private key, and derive a corresponding public key
  • Make DHT requests as usual, and respond to challenges using the private key

This would make passive surveillance harder. In particular, it would make indexing the DHT just by watching queries impossible. An attacker would be able to see pubkeys, but these pubkeys would be opaque to him, unless he knew the underlying infohashes. The attacker could not, upon getting a request for a key, make his own requests for the same key, since he would be unable to produce signatures in response to challenges.