libp2p/go-libp2p-autonat

DialBack error leads to disconnection of incoming connection from this peer

requilence opened this issue · 3 comments

Version information: 0.4.19

Type:

Description:

I have a local node under the non-mappable NAT (node1) with EnableAutoRelay = "true". Also I have publicly available IPFS node(node2) that has EnableAutoNATService = "true".
I manually connect node1 with node2 but noticed that it drops after some time from the swarm list.
After digging into p2p communication between them I found out that in some cases error produced by DialBack (node2 -> node1) connection leads to disconnection of the original connection (node1 -> node2).

Here is the filtered debug log on the node2 side. 12D3KooWEMNAaLJqAuN1v7NNGUqqcy32UkZPw8HLE8LteVUCANG4 is the node1's peer ID:

15:52:20.998 DEBUG    bitswap: PeerManager.Connected() 12D3KooWEMNAaLJqAuN1v7NNGUqqcy32UkZPw8HLE8LteVUCANG4 peermanager.go:77
15:52:21.048  INFO      relay: new relay stream from: <peer.ID 12*UCANG4> relay.go:235
15:52:28.725 DEBUG autonat-sv: New stream from 12D3KooWEMNAaLJqAuN1v7NNGUqqcy32UkZPw8HLE8LteVUCANG4 svc.go:67
15:52:28.725 DEBUG autonat-sv: AutoNATService.handleDial: 12D3KooWEMNAaLJqAuN1v7NNGUqqcy32UkZPw8HLE8LteVUCANG4%!(EXTRA string=/ip4/ip_removed/tcp/51678) svc.go:102
15:52:33.726 DEBUG autonat-sv: error dialing 12D3KooWEMNAaLJqAuN1v7NNGUqqcy32UkZPw8HLE8LteVUCANG4: dial attempt failed: <peer.ID Qm*DadLWX> --> <peer.ID 12*UCANG4> dial attempt failed: dial tcp4 ip_removed:51678: i/o timeout svc.go:188
DHT handleNewMessage stream reset 12D3KooWEMNAaLJqAuN1v7NNGUqqcy32UkZPw8HLE8LteVUCANG4 dht_net.go:75
15:55:51.623 DEBUG    bitswap: PeerManager.Disconnected() 12D3KooWEMNAaLJqAuN1v7NNGUqqcy32UkZPw8HLE8LteVUCANG4 peermanager.go:94
15:56:23.282 DEBUG    bitswap: PeerManager.Connected() 12D3KooWEMNAaLJqAuN1v7NNGUqqcy32UkZPw8HLE8LteVUCANG4 peermanager.go:77
15:56:23.370  INFO      relay: new relay stream from: <peer.ID 12*UCANG4> relay.go:235
15:56:31.007 DEBUG autonat-sv: New stream from 12D3KooWEMNAaLJqAuN1v7NNGUqqcy32UkZPw8HLE8LteVUCANG4 svc.go:67
15:56:31.008 DEBUG autonat-sv: AutoNATService.handleDial: 12D3KooWEMNAaLJqAuN1v7NNGUqqcy32UkZPw8HLE8LteVUCANG4%!(EXTRA string=/ip4/ip_removed/tcp/51678) svc.go:102
15:56:36.008 DEBUG autonat-sv: error dialing 12D3KooWEMNAaLJqAuN1v7NNGUqqcy32UkZPw8HLE8LteVUCANG4: dial attempt failed: <peer.ID Qm*DadLWX> --> <peer.ID 12*UCANG4> dial attempt failed: dial tcp4 ip_removed:51678: i/o timeout svc.go:188
DHT handleNewMessage stream reset 12D3KooWEMNAaLJqAuN1v7NNGUqqcy32UkZPw8HLE8LteVUCANG4 dht_net.go:75
15:56:50.388 DEBUG    bitswap: PeerManager.Disconnected() 12D3KooWEMNAaLJqAuN1v7NNGUqqcy32UkZPw8HLE8LteVUCANG4 peermanager.go:94
15:57:13.008 DEBUG autonat-sv: Error writing response to 12D3KooWEMNAaLJqAuN1v7NNGUqqcy32UkZPw8HLE8LteVUCANG4: stream reset svc.go:95

Can you show me the right direction so I can help to solve this?

vyzo commented

This has to be accidental, the autonat service uses a background host object for dialbacks.
Maybe it just happened for the connection manager to drop the connection?

Hm... maybe it comes from the autonat client side?

  • I caught this thing multiple times and timings are always matches the autonat check
  • When I turn off autonat-svc on node2 connection doesn't drop after some time

Here is logs from the node1 side (take a look from 15:56:50 events):

15:56:28.295  INFO    autonat: Discovered AutoNAT peer 12D3KooWCYV1JSyrmM8ZGK69igj8KkfpZKRdud1QtAqRR6gwthBe notify.go:33
15:56:30.998 DEBUG    autonat: DialBack 12D3KooWCYV1JSyrmM8ZGK69igj8KkfpZKRdud1QtAqRR6gwthBe autonat.go:141
15:56:30.998 DEBUG    autonat: Dial peer: 12D3KooWCYV1JSyrmM8ZGK69igj8KkfpZKRdud1QtAqRR6gwthBe, message ID: 12D3KooWEMNAaLJqAuN1v7NNGUqqcy32UkZPw8HLE8LteVUCANG4, addrs: [/ip4/127.0.0.1/tcp/35422 /ip4/ip_removed/tcp/35422 /ip6/::1/tcp/35422 /ip4/ip_removed/tcp/51678] client.go:57
15:56:50.395 DEBUG    autonat: Error dialing through 12D3KooWCYV1JSyrmM8ZGK69igj8KkfpZKRdud1QtAqRR6gwthBe: stream reset autonat.go:169
15:56:50.395 DEBUG    bitswap: PeerManager.Disconnected() 12D3KooWCYV1JSyrmM8ZGK69igj8KkfpZKRdud1QtAqRR6gwthBe peermanager.go:94
vyzo commented

It's possible that the extra connections to the autonat service cause the connection manager to go over its limits.
@raulk any ideas?