Feature can be abused to create cross-site covert channels
pes10k opened this issue · 33 comments
This issue is being filed as part of the requested PING review
This feature can be abused to create a cross-site covert channel; one site can write to the channel by manipulating the state of the CPU, and another site can read from the channel by using the proposed API to learn when the state of the CPU has changed.
The spec attempts to guard against this abuse in two ways:
i. rate limiting how frequently a reading site can read compute-pressure changes (every 1s for a active document, every 10s for a non-active document)
ii. only allowing one site to read from the channel at a time
I do not think either of these mitigations are defense though. For the first point, the spec says its intended to be used on pages users are likely to dwell on for a long time (e.g., video conferencing sites). In such a case, learning a bit of information every second is not a useful limitation. If a user is, for example, video conferencing for 5 minutes, thats at least to 300 bits of information, plenty of information to encode a unique identifier and necessary metadata. (At least because if there are 4 states that can be transmitted, your sending 2 bits of info per second, and so a clever attacker could double the bandwidth).
Thanks @pes10k, I think this threat is worth to be documented per #204 so that we can assess its feasibility and recommend appropriate mitigations accordingly. Feel free to drop pointers here from other domains where this type of vector may have been discussed (in PING or outside), even if as a generic case.
As discussed in the context of #216 I'd recommend incorporating this proposed cross-site covert channel attack into the Types of privacy and security threats section to ensure this is carefully considered and mitigations improved as appropriate.
@pes10k, PTAL this strawman proposal inspired by your contribution and provide feedback and suggestions for further improvements:
Cross-site covert channel
In computer security a covert channel creates a capability to transfer information between processes that are not supposed to be allowed to communicate. In modern multi-process web engines in the generic case each window or tab resides in its own process (documents that have the same origin or sites that have the same site typically share the same process). Using this API it may be possible to create a cross-site covert channel C where a site A on one tab first writes to the channel C after having manipulated the state of the CPU. Next a site B (that is not same site with site A) on another tab reads from the channel C by using this API to learn when the state of the CPU has changed. This process is repeated as long as the scripts run on both the sites A and B.
This attack is in part mitigated by Rate-limiting change notifications. Implementers are advised to consider additional mitigations for long-running scripts.
NOTE
The longer the scripts run the more information can be transmitted using the proposed cross-site covert channel. For example, if a user is on a video conferencing site and another long-running site that allows for more information to be transferred compared to regular browsing scenario.
@anssiko i think this is a good change, but i think something further is needed in terms of in-spec mitigation. Some questions:
- can i circumvent the rate limiting by activating and deactivating the listener, since i get the first message immediately on attaching / activating the listener?
- i didn't see any change to address the concern that one-bit-a-second isn't sufficiently slow to prevent the attack (since a/the primary use cases for this feature is long lived sites like video conferencing). I suggest one of the following approach (in order of preference)
a. limit the number of times the signal can flip. A listener can find out that the pressure moved to a new status a maximum of (say) 5 times, and then the listener no longer receives messages. I think this would achieve the main goal of the spec (allowing a processing-intense app to cool off if the system is being pegged) while restricting the amount of information that could be sent through the channel
b. have some exponential decay in the update rate. So, the longer you're listening on the channel, the less frequent the updates are
We can do a bit more. The below was a bit what I was thinking. They overlap a bit with your suggestions
Rate obfuscation
One option would be to put a limit on how many change events are acceptable, say per minute, and if that is reached, maybe postpone reporting for say 5-10 seconds. We could detect if abnormal behavior is happening, like say 10 change events spanning across multiple states and then delay reporting by a random value and only report the latest change.
Break calibration
Calibration is important to be able to manipulate the CPU into certain states, so slightly changing the buckets that result in the states at runtime would be a mitigation strategy, as well as including other hardware signals, like say temperature as you could expect the temperature to stay consistently high after continuously going into “critical” and “serious” state, without a cooling down period.
The broadcaster also cannot recalibrate as that would require using Compute Pressure API, meaning a different origin cannot listen at the same time, and it also needs to be in the foreground.
@kenchris those both sound like useful mitigations to include in the normative part of the spec, i dig um!
@pes10k Do you want this to be specified in algorithms etc., or in a more loose way that allows for innovation but highlights that these or similar mitigations will have to be applied?
I believe @pes10k wants us to have both:
- the "human-readable" description of the mitigations (as in this PR), and
- equivalent steps in the applicable algorithms.
I suggest we use the implementation-defined keyword in respective algorithms to allow implementers to innovate and differentiate. I referenced this definition in this PR too for consistency, see 464a4ae
@anssiko @kenchris my ask is that any full implementation of the spec mitigation include mitigations for the privacy harms introduced by the spec. I don't think its sufficient or compatible with the Web's privacy principals and goals for the spec to "require" privacy harming behaviors/features/capabilities, and then leave it up to implementors to figure out how to deal with that privacy harm.
TL;DR; the mitigations should be just as well defined and "required" as the privacy-risking functionality
I think we can write that mitigations should be in place (required) to avoid the side channel and we could even define algorithms for these.
@zolkis maybe you want to take a stab at that?
@pes10k I'm hearing we're aligned on the big picture. Here's what I think you agree to: PR #219 contains the advisory text, a human-readable description of the proposed attack and its mitigations. You're happy with this part, but you want us to in addition reflect all of this into normative prose i.e. inline it into the respective algorithms so that it is "must" (as in RFC 2119) for implementers to comply to.
To make this happen in a coordinated fashion, I propose we merge PR #219 now and work on the respective updates to the algorithms in another PR. That another PR requires close coordination with implementers to ensure all the mitigations are implementable and implemented. What I meant with implementation-defined keyword in the context of the algorithms is that the keyword should be used in places where implementers may want to e.g. use a different sliding observation window size to fit their product needs. If there's a baseline for the size to ensure an appropriate level of privacy protection we define the size normatively too, but allow implementers to be stricter than the baseline.
We want to be data-driven and I propose a proof of concept to be developed for the proposed attack to test its feasibility in a real-world scenario. With a PoC at hand we are better informed to specify the details of these proposed mitigations and recommend the minimum baseline. Sounds good? @kenchris will take the PoC exploration.
Thanks for your contributions!
(I had a discussion with one of the WG's Invited Experts, privacy researcher @maryammjd. She may be also interested in reviewing this issue and its accompanying PR #219. Her research interests include real-world privacy problems and she has conducted research on user perspective too.)
(Let me also bring in the WG's Invited Expert, researcher and a white hat hacker @toreini. I've also discussed this matter with him. He is familiar with the Brave's research paper on covert channels and as such is well-positioned to help the WG perfect the mitigations discussed in this issue and described in PR #219. I'm humbled and pleased to observer we have such great experts as @maryammjd and @toreini working in this important area in this WG!)
@kenchris (about rate obfuscation)
One option would be to put a limit on how many change events are acceptable, say per minute, and if that is reached, maybe postpone reporting for say 5-10 seconds. We could detect if abnormal behavior is happening, like say 10 change events spanning across multiple states and then delay reporting by a random value and only report the latest change.
The 5-10 seconds delay seems too large and may render the API useless for its purpose (tracking pressure and notify apps when they still can make relevant changes to react voluntarily). We can introduce some random variations, but in all APIs that have such real time bounds (to maintain usefulness), it's very hard to mitigate these kind of attacks, since the attacker's rate of communication could be just slowed down until the point to circumvent the mitigations. That actually is some kind of mitigation already, but I assume we want to do more.
We should also think what else makes the side channel / steganography more difficult, i.e. the conditions less regular or relevant - for instance, making it less likely to be able to trigger the API impl raise a notification by manipulating other parts of the system. That depends on that system, and most of those mitigations belong to the underlying platform, rather than the API implementations. Of course the spec should pass down the requirements, preferably without strong suggestions, then the implementations might use platform/HW specific mechanisms to mitigate. But then it's hard to guarantee the outcome.
The question is what to do when those mitigations are not available. Should the API impl switch to a safe but limited mode (to be defined), or switched off altogether (signaling a generic rather than specific error).
PR #219 was merged to establish a baseline for further review and comments. See the improved Security and privacy considerations.
Further improvements to be proposed in subsequent PRs or as comments in this issue.
Hi all,
I read through the document. In general, I think it reads well but I think it would be great if we could add discussion on cross-site covert channels in more privacy-focused browsing experiences such as incognito (private) mode as well. What is your opinion on this?
Just to give some context on the reasons why I prompt this:
We did a forensic analysis on private browsing a long time ago (all patched now, except for the resource allocation fingerprinting, a notion similar to a covert channel):
On the privacy of private browsing–a forensic approach -> open access link: https://eprints.ncl.ac.uk/file_store/production/197264/D8ACC693-D092-41D2-8682-1521007F31A6.pdf
Cheers,
Ehsan
@toreini thanks for your review and insightful comments!
Per https://www.w3.org/2001/tag/doc/private-browsing-modes/#features-supporting-private-browsing I think we should informatively guide implementers on options how they might alter the behaviour when in a private browsing mode. The use of the implementation-defined keyword would allow for that even in the context of normative prose. As discussed in that TAG Finding, we don't want the use of a private browsing mode to become a fingerprint itself for this API and as such should not define normatively how implementers must (in RFC 2119 terms) react if such a mode is turned on.
The design goal should be that it would be difficult if not impossible to detect by observing this API alone whether the private browsing mode is on.
Hi Anssi,
Yes, it makes sense especially as you mentioned this differentiation would make the fingerprinting stand out in unpredictable directions. Would it make sense to suggest something similar to this scenario as usecase example?
@toreini, because this is a generic issue I'd suggest expanding the note in https://www.w3.org/TR/compute-pressure/#mitigation-strategies as follows and refer to the TAG document:
This section gives a high-level view into mitigation strategies applicable to this specification. The normative definitions of these mitigations are integrated into the respective algorithms of this specification. Implementers are advised to consider the TAG guidance on private browsing modes when implementing the mitigations defined in this specification.
Feel free to suggest a better wording. The TAG Findings is open to contributions if you want to contribute your research as an additional reference: https://github.com/w3ctag/private-browsing-modes/issues
Hi @anssiko , this looks good enough to me. The point is to make sure the developers don't neglect these details. I will contact private browsing WG as well! Thanks for suggesting.
Cheers
Ehsan
@pes10k, we've now specified mitigations in normative algorithms to the proposed cross-site covert channel attack. These normative definitions complement the more human-readable description of these mitigations we added to the spec earlier. Here's the summary of changes:
- add new PressureObserver internal slots: [[ObservationWindow]], [[MaxChangesThreshold]], [[PenaltyDuration]], [[ChangesCountMap]], [[AfterPenaltyRecordMap]]
- add passes privacy test check to data delivery steps
- add reset observation window steps
- add adjusted pressure state concept
It's been an educational ride to mitigate this one and it's been great to work with you on this.
We'd be happy to talk about this work during the PING TPAC meeting on Tuesday 12 Sep if there's availability. I'll attend in person with @kenchris.
Hi @anssiko !
First, apologies for taking so long to reply. This looks terrific; sorry for not getting back to you quicker.
The only remaining feedback I have at this point is:
- some of the parameters in this process are implementation defined. It would be good to have some ceiling or floor for these, to ensure at minimum level of protection (otherwise, an implementation could have a complete and correct implementation the spec, without providing any protection)
- it would be good to have some advisory text in the Security and privacy considerations section about what implementors should consider when selecting "implementation defined" values, and the tradeoffs in privacy vs other goals for different values.
But again, i think this is fantastic, and I'm impressed and grateful for the WG's terrific work here! This is a wonderful privacy improvement!
@pes10k thank you for your continued feedback and advice that allow us to go beyond the usual expectations in this space! The WG is explicitly chartered to develop secure and privacy-preserving specifications and we're truly committed to that. Obviously this wouldn't be possible without collaboration with and contributions from privacy experts across W3C groups and academia.
I'm happy to see we're now on the PING TPAC meeting agenda (Tue 12 Sep, 17:00-18:30 Seville local). I think the approach to implementation-defined keyword, normative baseline floor and ceiling, might be good discussion topics with the broader PING audience. I'll pencil that in as one specific discussion point for this meeting.
As you know, the Infra Standard implementation-defined keyword is used in many specs. Perhaps Infra should give more elaborate advice to spec authors on how to use this keyword in the most appropriate way? Maybe we can propose something that'd generalize to other specs too. Let's discuss at the meeting.
yep @kenchris all good on my end. Again, thanks and congratulations to everyone involved in this subtle but critical privacy protecting improvement. I think this entire process will be an excellent case that other spec authors can model off
Thanks again for all the help! I totally agree
Hi all, I agree this is good for me as well! Well done...