w3c/network-error-logging

Clear report body fields based on `phase` and report destination

chlily1 opened this issue · 8 comments

NEL reports include information about the request that doesn't seem necessary for troubleshooting certain types of errors.

Some of these fields are cleared when non-DNS-phase reports are "downgraded", but such downgrading is not applied to all DNS and connection phase errors.

For troubleshooting DNS phase errors, only the hostname (not the full URL) is relevant, i.e., the path and query can be omitted. The server_ip, response_headers, and status_code won't be available, but the referrer, protocol, method, and request_headers are included in the report despite not being relevant to DNS resolution or available to DNS servers.

For troubleshooting connection phase errors, the full URL is also not necessary, nor are the request_headers and method.

To protect privacy, should these fields be cleared depending on the error phase? Or perhaps, to prevent unnecessary leakage of information across origins, should they be cleared only if the collecting Reporting endpoint does not share an origin with the NEL policy?

Would the following be reasonable to protect privacy while still retaining the utility of NEL?:

  • For DNS phase reports, only sampling_fraction, elapsed_time, phase, and type are included in report bodies, and the URL is truncated to the origin only.
  • For connection phase reports, only server_ip, protocol, sampling_fraction, elapsed_time, phase, and type are included in report bodies, and the URL is truncated to the origin only.
  • For other phases (i.e. application phase), all report body fields and the full URL are included in reports to the same origin, but only the origin portion of the URL, sampling_fraction, elapsed_time, phase, and type are included in reports to a different origin.

Correction: server_ip might be non-empty for a DNS phase report if it is a downgraded dns.address_changed report, in which case it should be included in the report.

To protect privacy, should these fields be cleared depending on the error phase?

+1 for this. Useless data is ... useless.

Or perhaps, to prevent unnecessary leakage of information across origins, should they be cleared only if the collecting Reporting endpoint does not share an origin with the NEL policy?

I predict many origins will use a third party service for collecting NEL reports, just like many origins do this to collect RUM performance data today.
If the origin trusts a third party to be the Reporting endpoint, that is their choice and it should not result in loss of data and insights.

E.g. for application phase reports, reducing the URL to only the origin portion of the URL (https://www.nike.com/shoes/men/sale becomes https://www.nike.com) would basically render a third party service useless as a Reporting endpoint.
As a DevOps engineer at Nike, I want to know which URLs returned 400 errors, not just which origins.