Network failures after connection establishment
Opened this issue · 1 comments
enygren commented
Brian Carpenter pointed out to a class of failures are network-level issues that happen after connection establishment.
NEL currently seems to define "network" as being issues between DNS and a connection being established.
However, there are a range of network issues that can arise after a connection is established.
Some of these are indistinguishable from application layer issues, but others are distinguisahable.
It would be valuable to have a way to talk about this class of issues and differentiate them.
For example:
- a NAT or CGNAT or firewall losing state on a longer-running connection may result in a TCP RESET being sent.
- TLS errors can happen post connection establishment for various reasons (eg, in renegotiation or other issues)
- Connections can get broken (eg, receiving a FIN or H2/H3 stream reset unexpectedly)
- QUIC and H2 may also be able to better distinguish between a slow application server and a network path failure (eg, by sending an H2 or QUIC "ping" to the endpoint).
- Path MTU issues (eg, PMTUD) can show up both in connection establishment (often as a TLS handshake failing) or later in connections and can be very hard to diagnose. This may be harder to figure out if there's a reasonable way to detect and indicate.
nicjansma commented
This issue was discussed at W3C TPAC 2024:
- Presentation
- Minutes
- Summary:
- This issue was discussed a little less than #175 but there was confirmation from other CDNs that getting this data can be valuable to diagnosing network issues
- As always, we would have to evaluate what could be exposed in a privacy-safe manner (maybe through differential privacy / aggregate reporting)
- Further discussions were suggested in IETF