w3c/network-error-logging

Network failures after connection establishment

Opened this issue · 1 comments

Brian Carpenter pointed out to a class of failures are network-level issues that happen after connection establishment.
NEL currently seems to define "network" as being issues between DNS and a connection being established.
However, there are a range of network issues that can arise after a connection is established.
Some of these are indistinguishable from application layer issues, but others are distinguisahable.

It would be valuable to have a way to talk about this class of issues and differentiate them.
For example:

  • a NAT or CGNAT or firewall losing state on a longer-running connection may result in a TCP RESET being sent.
  • TLS errors can happen post connection establishment for various reasons (eg, in renegotiation or other issues)
  • Connections can get broken (eg, receiving a FIN or H2/H3 stream reset unexpectedly)
  • QUIC and H2 may also be able to better distinguish between a slow application server and a network path failure (eg, by sending an H2 or QUIC "ping" to the endpoint).
  • Path MTU issues (eg, PMTUD) can show up both in connection establishment (often as a TLS handshake failing) or later in connections and can be very hard to diagnose. This may be harder to figure out if there's a reasonable way to detect and indicate.

This issue was discussed at W3C TPAC 2024:

  • Presentation
  • Minutes
  • Summary:
    • This issue was discussed a little less than #175 but there was confirmation from other CDNs that getting this data can be valuable to diagnosing network issues
    • As always, we would have to evaluate what could be exposed in a privacy-safe manner (maybe through differential privacy / aggregate reporting)
    • Further discussions were suggested in IETF