Network failures after connection establishment

Question

Network failures after connection establishment

Opened this issue 5 months ago · 1 comments

Brian Carpenter pointed out to a class of failures are network-level issues that happen after connection establishment.
NEL currently seems to define "network" as being issues between DNS and a connection being established.
However, there are a range of network issues that can arise after a connection is established.
Some of these are indistinguishable from application layer issues, but others are distinguisahable.

It would be valuable to have a way to talk about this class of issues and differentiate them.
For example:

a NAT or CGNAT or firewall losing state on a longer-running connection may result in a TCP RESET being sent.
TLS errors can happen post connection establishment for various reasons (eg, in renegotiation or other issues)
Connections can get broken (eg, receiving a FIN or H2/H3 stream reset unexpectedly)
QUIC and H2 may also be able to better distinguish between a slow application server and a network path failure (eg, by sending an H2 or QUIC "ping" to the endpoint).
Path MTU issues (eg, PMTUD) can show up both in connection establishment (often as a TLS handshake failing) or later in connections and can be very hard to diagnose. This may be harder to figure out if there's a reasonable way to detect and indicate.

Answer 1 · 2024-10-21T20:30:53.000Z

This issue was discussed at W3C TPAC 2024:

Presentation
Minutes
Summary:
- This issue was discussed a little less than #175 but there was confirmation from other CDNs that getting this data can be valuable to diagnosing network issues
- As always, we would have to evaluate what could be exposed in a privacy-safe manner (maybe through differential privacy / aggregate reporting)
- Further discussions were suggested in IETF