privacycg/proposals

Bounce Tracking Protection

johnwilander opened this issue · 75 comments

In the spirit of a community group, we’d like to share some of our Intelligent Tracking Prevention (ITP) research and see if cooperation can get us all to better tracking prevention for a problem we call bounce tracking.

Safari’s Old Cookie Policy

The original Safari default cookie policy, circa 2003, was this: Cookies may not be set in a third-party context unless the domain already has a cookie set in a first-party context. This effectively meant you had to “seed” your cookie jar as first party.

Bounce Tracking

When working on what became ITP, our research found that trackers were bypassing the third-party cookie policy through a pattern we call "bounce tracking" or "redirect tracking." Here's how it works:

  1. The content publisher's page embeds a third-party script from tracker.example.
  2. The third-party script tries to read third-party cookies for tracker.example.
  3. If it can't, it redirects the top level to tracker.example using window.location or by hijacking all links on the page.
  4. tracker.example is now first party and sets a cookie—it seeds its cookie jar.
  5. tracker.example redirects back to the original page URL or to the intended link destination.
  6. The tracker.example cookie can now be read back in third-party contexts.

Modern tracking prevention features generally block both reading and writing cookies in third-party contexts for domains believed to be trackers. However, it's easy to modify bounce tracking to circumvent such tracking prevention. Step 5 simply needs to pass the cookie value in a URL parameter, and step 6 can stash it in first-party storage on the landing page.

Bounce tracking is also hard to defend against since at the time of the request, the browser doesn’t know if it’ll be redirected.

Safari’s Current Defense Against Bounce Tracking

ITP defends against bounce tracking by periodically purging storage for classified domains that the user doesn’t interact with. Doing navigational redirection is one of the conditions that can get a domain classified by ITP so being a “pure bounce tracker” that never shows up in a third-party context does not suffice to avoid classification. The remaining issue is potential bounce tracking by sites that do not get their storage purged, for instance due to the fact that the user is logged in to the site and uses it.

Can Privacy CG Find a Comprehensive Defense?

We believe other browsers with tracking prevention have no defense against bounce tracking (please correct if this is inaccurate) and it seems likely that bounce tracking is in active use. Because we've described bounce tracking publicly before, we don't consider the details in this issue to be a new privacy vulnerability disclosure. But we'd like the Privacy CG to define some kind of defense.

Here are a few ideas to get us started:

  • Adopt ITP’s current defense. This could be done as a periodic purge of cookies for websites without user interaction or combined with a classifier that only subjects domains that show bounce tracking behavior to this periodic purge.
  • Detect bounce tracking patterns and put offenders in a SameSite=Strict jail. This would mean the user can still be logged in to the offending websites by loading them directly, but they would see no cookies when they engaged in bounce tracking. Note though that a bounce tracker can navigate publisher.example–>tracker.example–>tracker.example–>destination.example where the second navigation to tracker.example is same-site and will possibly reveal SameSite=Strict cookies. SameSite=Strict cookies may have to be hardened against this kind of attack.
  • Detect bounce tracking patterns and put the offenders on some kind of block list. This could be done on-device based on web browsing or centralized through crawls. However, it would lead to broken page loads.
  • Detect a redirect on the response and re-raise the bounce request without cookies. This has load performance costs, could break some OAuth flows, and only addresses the “carry ID forward” part of the tracking, not the “user X clicked link Y on website Z” part. This protection would also be vulnerable to correlation between the initial request carrying cookies and the re-raised one.
  • Purge non-SameSite=Strict cookies after the domain has shown bounce tracking behavior or by combining it with the block list approach mentioned above.

Some context, we did some measurement of this ~1 yr ago.

https://brave.com/redirection-based-tracking/

I'd be very interested in other numbers folks might be have, especially as it might help us understand how this risk compares against other risks on the platform. (not that we shouldn't trace down every leak in the platform, but interested in the highest-marginal-benefit ranking)

To highlight a similar mechanism for completeness, sorry if it's documented elsewhere and not considered bounce tracking:

  1. The content publisher's page embeds a third-party script from tracker.example.
  2. The third-party script tries to read third-party cookies for tracker.example.
  3. If it can't, it injects a tracker.example iframe on the publisher's page.
  4. User clicks on content in the iframe (intentionally or via click-jacking).
  5. Using window.open, a new tab/window is opened for tracker.example.
  6. tracker.example window is now first party and can read or write cookies.
  7. tracker.example window accesses a function on tracker.example iframe, via window.opener, to pass an identifier.
  8. tracker.example window closes itself, and was only open for a short amount of time.
  9. Identifier can be passed to initial third-party script via postMessage and stored in first-party storage for continued tracking on the site.

To highlight a similar mechanism for completeness, sorry if it's documented elsewhere and not considered bounce tracking:

  1. The content publisher's page embeds a third-party script from tracker.example.
  2. The third-party script tries to read third-party cookies for tracker.example.
  3. If it can't, it injects a tracker.example iframe on the publisher's page.
  4. User clicks on content in the iframe (intentionally or via click-jacking).
  5. Using window.open, a new tab/window is opened for tracker.example.
  6. tracker.example window is now first party and can read or write cookies.
  7. tracker.example window accesses a function on tracker.example iframe, via window.opener, to pass an identifier.
  8. tracker.example window closes itself, and was only open for a short amount of time.
  9. Identifier can be passed to initial third-party script via postMessage and stored in first-party storage for continued tracking on the site.

Interesting. Have you seen this in the wild for tracking purposes? I wouldn't call it bounce tracking, rather insta-popup tracking or brief popup tracking. Maybe file as individual issue? I'd love to hash it out.

We should definitely defend against this kind of pop-up based tracking if it's being exploited. (Unfortunately, it sounds rather similar to pop-up based OAuth flows.)

Some context, we did some measurement of this ~1 yr ago.

https://brave.com/redirection-based-tracking/

I'd be very interested in other numbers folks might be have, especially as it might help us understand how this risk compares against other risks on the platform.

This is cool data!

I had a hard time figuring out how prevalent bounce tracking is from that post, or in particular the subset we are calling bounce tracking. Bounce tracking is a subset of first-party redirect tracking (which also includes things like URL shorteners, or sites that send all outgoing links through a redirect) which in turn is a subset of redirect tracking (which can include redirects in third-party context). But I couldn't even figure out how prevalent redirect tracking in general is. Help appreciated!

As mentioned by John, we know that at least one significant tracking firm was providing bounce tracking for Safari users across a number of sites, before ITP rendered it ineffective. We don't know current prevalence though.

@othermaciej thanks! We used the measurements to estimate how often users would hit bounce trackers, using a bunch of random walks of the web, weighted by initial website popularity. Not perfect of course, but a useful first cut.

Our initial plan was to use the above to identify bounce tracking domains (e.g. domains that have storage read or written to, but were only intermediate in some bounce). E.g.

  1. click link on popular site
  2. arbitrary number of other eTLD+1s the user is 301'ed or otherwise, where something storage related happens
  3. finally land on another eTLD+1 thats distinct from all the above.

the "2"s in the above are what got counted in the research behind the blog post, and what we started building lists of. The short term plan was to just block storage on these domains, even in 1p context, until there was a user gesture.

We had some more ideas too that were promising, but have (so far) been triaged down, since we didn't think this was where the best "privacy bang for the buck was", but the partial list includes:

  • stage all storage between navigations, during redirection chains, and discard if it looked like bounce tracking
  • use the list to identify places we could continuously probe, and strip URL / query information off during navigations where the redirection chain wound up in the same place, to identify params that could be removed or altered w/o effecting the where the navigation wound up
  • see if we could figure out which params in the intermediate step contained the final URL destination information, and if so, if we could just extract that when leaving the initial domain and "short circuit" the user to the final domain.

(all "white boarding stage stuff", but where our mind was at at the time)

Thanks for publishing this proposal!

I would like to point out that adding on-device based models could potentially harm privacy and security of users so I think that the crawlers-based approaches that you suggest should be preferred.

Thanks for publishing this proposal!

I would like to point out that adding on-device based models could potentially harm privacy and security of users so I think that the crawlers-based approaches that you suggest should be preferred.

Hi! We are well aware of that research. However, it's the observability of global state that's problematic, not the global state itself. And when looking at the observability you have to weigh it against just letting a tracking vector exist. Often you'll find that removing the tracking vector does more for privacy than avoiding the observability of global state. Ideally, you have neither.

For browsers without comprehensive tracking prevention on by default, cookies, web storage, and HTTP cache are readily available global state that's observable cross-site. I.e. that's where you have to start if observability of global state is something you want to defend against.

The remaining issue is potential bounce tracking by sites that do not get their storage purged, for instance due to the fact that the user is logged in to the site and uses it.

Distinguishing legitimate federated sign-on scenarios and legitimate analytics and affiliate scenarios from permissionless bounce tracking seems quite hard. As it is, the "user interaction" signal currently in ITP seems likely to have both false positives and false negatives with the consequence of making it harder for users to stay logged in to authentication providers they care about.

As @othermaciej points out

We should definitely defend against this kind of pop-up based tracking if it's being exploited. (Unfortunately, it sounds rather similar to pop-up based OAuth flows.)

However, it's the observability of global state that's problematic, not the global state itself.

Observing the side effects that you mentioned in the proposal would be trivial, that's why I proposed to go for the shared state across all users. Since we are trying to address the problem let's try to tackle it once and for good. I feel like moving the tracking capability to a different vector would be less beneficial than removing it completely.

(I think that if there is a local model that changes navigation behavior it will probably always be observable)

IMO tracking users based on a list shared across millions of them would likely be more complicated.

For browsers without comprehensive tracking prevention on by default, cookies, web storage, and HTTP cache are readily available global state that's observable cross-site. I.e. that's where you have to start if observability of global state is something you want to defend against.

I agree something has to be done but not at the risk of harming security. This is not a situation in which the worst case is an ineffective mitigation like it would be for a poorly configured CSP, this is a case similar to the XSS auditor, which introduced XSLeaks in almost every website in existence for a debatable advantage over XSS.

Overall, I think the crawler solution would be more solid, I'm not against this proposal, jut here vouching for one of the options you put out 😸

Crawler-based classification has holes too, there may be trackers not detectable from the network position of the crawlers but that are detectable for a given user. (Due to geospecific redirects or even trackers identifying the IP block of the crawlers).

Going back to classification In this case, let's consider the proposal with single bit of "classified as a bounce tracker" that puts a site into SameSite=Strict jail. This is pretty hard to abuse. It's detectable only during an actual attempt at bounce tracking, and combining the bits into a usable unique ID requires bouncing through many (32ish) distinct domains every time serially, which is likely to be a prohibitive performance cost. And that bouncing will itself cause all of them to be identified as bounce trackers, so IDs of this form will self-destruct after only a few uses at most. Capping the length of redirect chains is also likely to be web compatible, at a level lower than what would be needed to pull this off.

Let me know if there's a mistake in this analysis.

That said, a combo of crawling and client-side detection may be the right balance.

hober commented

@snyderp, would it be fair to conclude from your comments on this issue that Brave would like to see this proposal taken up by the CG?

I'm not sure I understand what the specific proposal is, but I'm very in favor of PrivacyCG working on this problem! :)

I'm not sure I understand what the specific proposal is, but I'm very in favor of PrivacyCG working on this problem! :)

We wanted to share this at an earlier stage than a crisp proposal to see if we can have this community group work its way through the issue and land in a shared solution. So the proposal is to take on the work of defining the vulnerability and enumerating defenses we believe work, possibly settling on a single one.

The outcome could be standards language which conveys that user agents may put restrictions A, B, and C in place to defend against attack X.

Ah, then if the proposal is "lets put our brains together and figure out a good, standards-focused, cross browser solution to this problem" then brave is 100% on board

(didn't mean to be a pedant, just didn't know if the proposal was substantive or procedural)

This may need to be written up as an explainer about the problem and the solution space before we start drafting in spec-level detail

hober commented

Would you like time on the next call to talk about this proposal?

I'd hope it's not a broad measurement like periodic purge of cookies for websites without user interaction tools like A/B testing for improvements of user interface (of one organization) already have a hard time with ITP and I hope solutions can be focused on the offenders and less or "everyone"

Would you like time on the next call to talk about this proposal?

If you're asking me, then yes, I can talk a bit about it on the next call.

I agree this does sound remarkably similar to OAuth flows, which I think we would want to keep generally, though I can see an argument that we might consider removing this behavior is a reasonable push towards proposals like the Trust Token API, though that's a long-term plan. Is the intent here to whitelist specific known-to-be-for-oauth domains? How are people who are attempting currently to block this behavior handling OAuth?

Are folks interested in having an ad-hoc meeting to discuss this? cc @johnwilander @jackfrankland @pes10k @englehardt @othermaciej @AramZS

If so, please file an issue here.

I would be interested in attending a call on it

@johnwilander are you up for leading this discussion in an ad-hoc meeting?

Sure, I can lead the discussion. Would love help on scribing though.

Thanks @johnwilander! We'll start working on scheduling this.

Ah, actually I should take scheduling over to a new issue. Filed privacycg/meetings#5.

Reminder - ad-hoc meeting on this topic Thursday, April 23rd 10am PDT per privacycg/meetings#5 (comment)

It can impact Redirect AB Tests as well. The case would be two domains owned by a company where they are AB testing a complete design overhaul with a different domain e.g
example.com -> newuiexample.com

If a visitor is chosen to have a redirect of this particular test, then after 1 month of inactivity, cookies would be purged from example.com and then he might not see the redirects.

hober commented

@johnwilander, could you summarize how the call went & let the folks on this issue know what your next steps will be? Here are the minutes from the call.

hober commented

@johnwilander, could you summarize how the call went & let the folks on this issue know what your next steps will be? Here are the minutes from the call.

John?

Thanks for the reminder!

My summary of the virtual f2f call:

  • Federated logins flows and SSO flows are sometimes indistinguishable from bounce tracking. We may need an intermediary user confirm step to disentangle them.
  • Mozilla about to ship deletion of website data for domains on their block list that haven't received user interaction (including scrolling). Safari already ships this behavior but not based on a list and not including scrolling as user interaction. This means it could be proposed as a standard behavior at some level of abstraction.
  • Brave has done a crawl study and found very little bounce tracking. That may be because of their Chrome user agent string. Safari and Firefox user agents are potential triggers for bounce tracking. Brave is considering interstitials to "reveal" bounce trackers.
  • Chrome would like to see if the Web ID proposal (personal repo) can help disentangle logins flows from bounce tracking. They are reluctant to use lists or heuristics to fight tracking.
  • Saleforce worries about domains ending up on block lists and the process of getting them off of there.
  • Scroll worries about interstitials. Would like to see special treatment of domains where the user is logged in, e.g. IsLoggedIn.

Next steps:

  • See if Mozilla and Apple can formulate a proposal on website data deletion for domains without user interaction that can work for both approaches.
  • Engage with SameSite cookie spec authors to see if some kind of SameSite=StrictStrict setting would make sense. Based on casual comments on Twitter, it seems that sites voluntarily adding a same-site redirect to get access to their cookies in a bounce tracking scenario was never considered in the threat model.
  • Set up a Privacy CG + WebAppSec cross-group meeting to discuss how to formalize logins and login state. That discussion should ideally cover IsLoggedIn, Web ID, and HTTP State Tokens (personal repo).

it seems that sites voluntarily adding a same-site redirect to get access to their cookies in a bounce tracking scenario was never considered in the threat model.

The primary goal of SameSite cookies is to stop CSRF attacks. Same-site redirects (even "same page" redirects) to regain cookie visibility when using Strict cookies was a recommendation we discussed that explicitly safe landing pages could use so they had workable cookies while leaving the vast majority of the site protected at the Strict level. (The other main approach would be to use separate Lax identity cookies and a Strict auth token.)

SameSite cookies are not a tracking protection mechanism so you're right that we did not consider Bounce tracking. To aid usability in the threat model we did consider we explicitly support same-site redirects.

We should definitely defend against this kind of pop-up based tracking if it's being exploited. (Unfortunately, it sounds rather similar to pop-up based OAuth flows.)

Yep, we ran into this too. We've been calling this the classification problem (we are not very good with names :)): it is hard for browsers to differentiate / distinguish between OAuth-flows and non-OAuth-flows because OAuth /OpenID has been built on top of low level primitives.

https://github.com/samuelgoto/WebID#the-classification-problem

On first thought, partitioning the API space between auth and non-auth seemed sufficient, but solving the classification problem with high-level APIs is also insufficient if their implementation are not sufficiently tied to Sign-in (for example, with IDP-controlled UIs below) because an attacker/abuser may use it for other non-OAuth purposes, and in doing so takes you back to your original problem of bounce tracking.

The best guess so far is that, to address the classification problem, one has to (a) partition the API space (e.g. high-level / intent-specific APIs) but also (b) make the implementation of the API meaningless to use cases outside of the one intended (e.g. implement high-level / intent-specific UIs, for example here) so that a bounce tracker can't use it.

This implies that the browser takes a much larger role in signing-in, mediating most of the exchange (which has its consequences), but seems at first sight to sufficiently address the classification problem.

So, to go back to your original point, I think WebID can, possibly, help bounce tracking by addressing the classification problem (i.e. allowing a browser to distinguish oauth vs non-oauth use cases) and in doing so allowing it to use different policies. Seems insufficient, but perhaps a constructive/meaningful step forward.

There is now a bounce tracking proposal called SWAN (bounce tracking as in trying to track, not preventing it). Details here: https://github.com/SWAN-community/swan/blob/main/data-flows.md

We should revisit the bounce tracking protection topic. Ping @pes10k.

Sounds good, we're just about to announce something related to this too (mostly list based, for v1, with some more interesting follow ups expected shortly after), so revisiting sounds great

@johnwilander @pes10k Thanks for adding SWAN to the agenda for the F2F.

As a member of the SWAN.community I can explain the approach, including how economics, law and engineering disciplines have come to together to produce a solution which gives people meaningful control and choice over privacy.

For those that would like to find out more about SWAN.community's approach to privacy ahead of the May F2F we have drop in sessions throughout April and early May with some of the lawyers that worked on it.

See https://event.webinarjam.com/register/10/plqm1hw

@johnwilander @pes10k Thanks for adding SWAN to the agenda for the F2F.

This agenda item is not to discuss SWAN which is not (yet) a work item or proposal in this community group. This agenda item is to discuss protection against bounce tracking.

@johnwilander Understood. If the group would like to know more about the approach to privacy and choice advocated by SWAN.community then I'd be happy to explain more and thereby inform the discussion on bounce tracking.

In relation to the F2F agenda on bounce tracking I'm interested to learn about the harms and the protection required.

It is very clear that this cannot be discussed without also discussing SWAN, since it seems aimed to kill SWAN.

SWAN establishes a legal basis for data processing among controllers, has a mechanisms for audit, is aligned to solving the problems of this group, and creates a level playing field concerning competition irrespective of organization size or the other services that an entity operates, I certainly hope no one is working against these objectives or goals at the W3C. I would be extremely concerned if they were.

SWAN.community would welcome the opportunity at next week's face to face to explain SWAN to this group and demonstrate how the work complies with laws and regulators stated requirements. That is a matter for the chairs of this group who control the agenda. The chairs could co-ordinate with me to ensure representatives from SWAN.community are available when bounce tracking protection is discussed to ensure these stakeholders views are represented in the discussion.

Only courts can establish a legal basis for data protection.

Technology could help controllers acquire subjects' consent, which could be a claimed legal basis if it meets the strict validity requirements. In some circumstances a claim for the public or legitimate interest basis could be supported by technology enabling subjects' right-to-object.

My understanding of SWAN is that it uses redirection to bypass third-party cookie blocking. The claim is that the resultant data processing is covered by the legitimate interest basis, with the data protection role of browsers replaced by legal contracts between the third-party entities.

This is well beyond the capacity of this group to decide, our focus should only be the technology.

As @johnwilander has said, bounce tracking emerged several years ago and was clearly desiged to avoid Safari's original protections against third-party tracking.
A New Method Bypassing Safari's Third-Party Cookies Blocking:

There are also multiple ways that third-party script can record first-party state, and enable its correlation by third-party servers to build a cross-domain tracking vector, as pointed out above by @jackfrankland and others.

To mitigate this probably requires more control over first-party state..

One feature of the SWAN proposal that could be helpful in this is the ability to communicate first-party acquired user consent state to third-parties, removing the need to bombard users with repeated consent panels or storage access prompts. To avoid the tracking risks this state has to be restricted to low entropy values, first-party located and only browser triggered, but with the ability to communicate it site-specifically to embedees via a request header.

Maybe we could discuss this in the F2F.

Only courts can establish a legal basis for data protection.

If this group only focused on technology, then it sounds like you are saying we should drop safari's proposals to enable these "safeguards" as they merely force a legal issue we have no right to decide. But I am confused because you then say that swan and not safari is in the wrong for bringing up a legal issue? That seems contradictory to me.

Why were Safari's original 'safeguards' deemed necessary? Wasn't that determination out of scope and inappropriate?

I think it is improper form to make the claim that SWAN is bypassing Safari, or Safari is Bypassing swan. Both are true, they disagree about what privacy means, and I agree that we should 100% discuss this at the FtF.

@TheMaskMaker, would you be willing to share who you are and your affiliation? It's good to understand who's represented here and whose viewpoints are being shared.

@johnwilander Of course, in fact if I haven't already managed it, it would help me greatly if you could advise me how to link my W3 account to this working group; I thought I already did but the interface has been giving me no peace! I picked the name 'mask maker' back in my younger days on account of the hobby of making costume masks, and now I wish I had just made a different account.

On a separate note, I have a bit of an awkward next comment; in order to explain a concern I have I need to mention Apple/Safari in relation to a big privacy threat and I want to be clear it does not reflect on my opinions of you at all, as I can tell you are working hard like the rest of us to improve privacy.

If anything maybe you can give me more insight there.

@michael-oneill
Let me give you an example of one of my concerns over this bounce-tracking prevention proposal:

The United States government's Federal Trade Commission's webcast on 'dark patterns' (deceptive tactics) in the web called out Apple's Safari's ability to track users through integrations with ios that do not include the use of cookies or bouncing to my knowledge as a threat. If we really do intend to wipe out all user tracking, Apple may wish to make a declaration that it will or has ceased all AppleId, tracking profiling, adsales of user data, and customization through user profiling. And this should be just as enforced and auditable by the community as anything else we do. I do not believe that has happened.

If Safari still plans to use apple ids through ios-safari integrations then this bounce tracking does nothing to prevent user tracking, it just monopolizes for the browser and hides it from the user and web publishers. SWAN, if adopted by Apple/Safari, would expose that data, and allow the user to see it, opt out of it, even delete it.

Bounce-tracking prevention, if adopted by safari, would not. It would only prevent SWAN. Which would prevent this level of user control.

I don't see SWAN as trying to bypass privacy, I see that as trying to enforce it.

@johnwilander I also want to be clear that I am not at this time supporting swan over your proposal either. I like some aspects of it.

In fact my ideal case is if you and James could work together.

I think the transparency and control aspects of SWAN would be a huge boon for users and keep the market open, and combined with browser safeguards, and agreements from higher powers like Apple/Google, we could achieve a privacy solution better than either of you are likely to come up with or be able to implement if the browser and adtech are at odds and both trying to cheat whatever systems we come up with.

This issue is about bounce tracking protection and collaboration in this community group to achieve bounce tracking protection. It's not about other kinds of tracking or tracking protections. Please refrain from discussing other things than bounce tracking protection here since doing so makes it harder to stay focused on what this proposal is about. You can file your own issues for things you'd like to discuss or propose as work items. Thanks!

There is only one mention of Apple in the transcript of the FTC's dark pattern and it is in support of Apple's removal of the IDFA. There are public statements from Safari engineers clearly indicating that history in Safari is synced E2EE and therefore off limits to Apple — as it should be.

It is overwhelmingly clear that users expect their browsers to protect their data: 89% want their browser to prevent their data from being shared (source: Eurobarometer). Safari is acting in direct support of its users and the work the Safari team is doing plays a direct role in convincing users to choose Safari and Apple products. There are many ways in which these user expectations can be violated and the protections implemented to support them can be deliberately circumvented. Bounce tracking is one. Closing that loophole is a natural step forward.

The Times definitely supports moving this forward. Avoiding this kind of circumvention aligns with our readers' expectations, creates a more trustworthy Web, is better for publishers' businesses, and opens the door to a more competitive ad market with fewer network effects in the valuation of data.

There is only one mention of Apple in the transcript of the FTC's dark pattern and it is in support of Apple's removal of the IDFA.

[EDIT]
I went back and read it to check what I remember watching, and while you are right about that particular mention being initially positive, (the words "potentially inherently manipulative" are later to used to describe the consent mechanisms themselves, though this is I believe a more general comment) they do describe it is an opt-in system that still enables user tracking. This proposal would still prevent tracking for competitors with or without consent that Apple can do with consent.

Also the pattern is described as dark most certainly as a negative in reference to Google and similar login systems. I'm glad Apple is heading in the right direction, but the point is the tracking is still there. Thus the competition concern exists.
[END EDIT]

89% want their browser to prevent their data from being shared

I have read otherwise. https://iabeurope.eu/all-news/iab-europe-news/latest-research-shows-eu-citizens-understand-and-appreciate-the-ad-supported-internet/

And regardless you are talking about a proposal that gives users more control not less. They can choose to not share it. I don't understand your objection

@johnwilander I don't think its appropriate to call this out of scope.

If a proposal kills tracking for some businesses and not others, that is in scope as it violates W3 rules for anti competition, especially if the proposer's company directly benefits. I made it clear this is not an attack on you, and it is certainly a legitimate concern.

Also if your proposal is designed, as you yourself claim, to kill another proposal, then that is also in scope.

And whether or not this protection is needed as written or does more harm than good is most certainly in scope.

Please let me know what here specifically you disagree with as being in scope?

@johnwilander I don't think its appropriate to call this out of scope.

If a proposal kills tracking for some businesses and not others, that is in scope as it violates W3 rules for anti competition, especially if the proposer's company directly benefits. I made it clear this is not an attack on you, and it is certainly a legitimate concern.

Also if your proposal is designed, as you yourself claim, to kill another proposal, then that is also in scope.

And whether or not this protection is needed as written or does more harm than good is most certainly in scope.

Please let me know what here specifically you disagree with as being in scope?

I filed this issue and the scope is bounce tracking protection. If you want to discuss other topics, please file your own issues. Re-using existing issues for tangential or related discussions doesn't help. On the contrary, people interested in bounce tracking protection might drop out of this conversation because there's too much noise.

I watched the entire event and I don't recall anything even remotely resembling a mention of Safari tracking. That would be major news that happens to be right in my area of research. You seem to be drawing a highly contrived conclusion tying a specific set of dark patterns that apps may use, then noting that Safari is an app, and from there concluding that they're doing it. The idea that the FTC is not providing proper transcripts is not serious.

Regarding what users want, the situation is pretty clear. I'm citing a study by an established and highly trustworthy official statistics agency that asked users whether they want their browser to prevent their data from being shared: they do. You're citing a lobbying group that put a completely disingenuous false dichotomy in front of users — do you want to do it the IAB way OR do you want everything to become expensive? Let's not pretend that there's any kind of equivalence here.

You're confusing "control" and "being forced to deal with it". Having my browser match my expectation is greater control precisely because I never have to deal with it. We have a great mechanism to address niche needs: browser extensions. For the small minority of users who do want their browser sharing data, that's a perfect way to enable this. A few clicks to install and they can share their data whichever way they want to.

Anyway — I just came here to support this work. Support it I do.

@johnwilander I feel this has become contentious, and that you do not plan to explain how my comments are out of scope. I think it would be best if we took a break for the moment.

@darobin I see and respect your support. I am afraid much of what you object to that I say is not of my making, however, rather it is the process of the w3 that I am trying my best to follow, to give examples:

You are right I am speaking of may, and there is a good reason. may over does is the threat model dictated by most proposals that are being discussed here at the moment and I am using their methodology; it is not one I chose. may is considered a threat even if it doesn't necessarily happen, or happen today. This is why I only go so far as to indicate may. Also I think before you posted I did re-read the transcript and modify my post (clearly marked what was changed) as you were in some ways correct but in some ways I believe mistaken. I also had watched that segment.

browser extensions to share data in this way would also not be allowed under some theories of privacy, namely the one being discussed here, as they seek to eliminate tracking even with consent and make it 100% impossible.

Additionally, I don't think its fair for most users to know about the threat of pay-wall and walled gardens if they are not told. I didn't even know. You criticize the iab quite harshly, but I have read other articles as well and this was the one I had on hand in a slack post from a colleague.

Browsers are on a path to eliminate vectors for cross-site tracking. The Privacy CG's Storage Partitioning Work Item is an example of the community working to identify and define current vectors that enable such activity so that browsers can directly address them.

Bounce tracking is another vector that can currently be used for passive cross-site tracking of users, so it's appropriate to discuss what undesirable primitives it enables and, where the primitives run counter to the stated user privacy-protecting goals of many user agents, discuss possible solutions to solve them.

There are many existing solutions that build on top of browser APIs in ways that may no longer be viable as browsers work together to evolve the web platform's design to protect users by default. This reality is why new APIs which aim to solve existing use cases while aligning with the privacy-protecting-by-default platform are being proposed and discussed.

We should focus our conversation for this issue on what technical mechanisms are viable to address the non-privacy-preserving aspects of bounce tracking. For use cases and scenarios that may be impacted by this work, I would recommend interested parties evaluate making separate proposals outside of this issue for new, privacy-preserving mechanisms to support those scenarios; this will enable those proposals to have their merits discussed in a more dedicated and intentional way.

@TheMaskMaker Sorry but I don't understand what you are trying to say about the W3C process. If you have a specific process-related point, please cite the relevant Process document section.

This is not about "theories of privacy". It's an established fact that interruptive prompting for data, like SWAN does, does not lead to people making informed decisions. Preventing such tricks from being used aligns with user interest. If some people want to go out of their way to install an extension, then we can talk about informed consent. It's also a more sustainable solution than the kind of deliberate circumvention of defensive measures that SWAN is built on.

And the alternative to a data free-for-all isn't paywalls, it's a healthier ad ecosystem less dominated by the big tracking companies.

@darobin Are you stating that most people cannot make informed decisions, hence we should remove choice? Or rather, that data collection and processing of personal data is a complex topic that most people do not spend time to educate themselves about and hence it is too difficult for non-technical audiences to understand the choice being prompted?

Big "tracking" companies often track on user behavior on their own properties, which ought to raise similar concerns about people understanding what data is collected and how it is processed.

I would love to hear your proposal on how small publishers or new startups ought to be able to compete against larger, established companies, when both personalization and advertising use cases often benefit from having access to greater amounts of input data (even when such data is deidentified)?

If this group only focused on technology, then it sounds like you are saying we should drop safari's proposals to enable these "safeguards" as they merely force a legal issue we have no right to decide. But I am confused because you then say that swan and not safari is in the wrong for bringing up a legal issue? That seems contradictory to me.

Of course we can discuss the legal aspects of privacy, probably should, but we should not pretend any technology that emerges, even from a standards body, "establishes a legal basis for data processing among controllers". Only legal process can do that.

Techniques to protect privacy are clearly in scope, but the main thrust of SWAN seems to be to bypass existing protections.

I was recently point to this a case study titled The Promise and Shortcomings of Privacy Multistakeholder Policymaking which covers W3C and DNT. I’m trying to learn from this and urge others to read it as well.

@michael-oneill My reference to establishing a “legal basis” is to the contractual framework for data sharing created under SWAN.community (SWAN.co) model terms to ensure all parties play by the same rules. It is not a reference to a “lawful basis” under the UK/EU GDPR. Data protection regulators in particular encourage controllers to put in place data sharing agreements to provide certainty and clarity regarding each party’s obligations and in this way SWAN.co’s model terms support (not replace) controllers’ direct obligations under the law.

Preferences in SWAN are designed to be based on the lawful basis of consent for UK/EU GDPR purposes. There is an ongoing debate (even amongst regulators) as to whether certain “subsequent processing” of cookie data can be based on legitimate interests e.g. for measurement purposes. SWAN.co can support the optionality for both bases but challenges the prevailing view that individuating purposes of use into several technical use cases achieves the “informed” consent standard demanded by the UK/EU GDPR. Several studies indicate users are not engaging effectively with the current typical arrangements, to the detriment of their privacy.

SWAN’s starting position is that different types of cookies are neither “good” or “bad” per se. Used responsibly, and with the individual end user’s meaningful consent, they have an important part to play in the preservation of the Open Web and the mission of the W3C.

As the TAG recognized recently in their First Party Sets report, there is a conflict of interest where a browser is owned and operated for the owner’s commercial interests not the end user interests. This is noted to be contrary to the WC3’s design principles – see Put user needs first (Priority of Constituencies).

It is also worth reflecting on the W3C Design Principles that should be in line with the objectives of privacy protection where users are given a meaningful choice:

If a trade-off needs to be made, always put user needs above all.

Similarly, when beginning to design an API, be sure to understand and document the user need that the API aims to address.

The internet is for end users: any change made to the web platform has the potential to affect vast numbers of people, and may have a profound impact on any person’s life. [RFC8890]

User needs come before the needs of web page authors, which come before than the needs of user agent implementors, which come before than the needs of specification writers, which come before theoretical purity.

Like all principles, this isn’t absolute. Ease of authoring affects how content reaches users. User agents have to prioritize finite engineering resources, which affects how features reach authors. Specification writers also have finite resources, and theoretical concerns reflect underlying needs of all of these groups.

See also:
The web should not cause harm to society
[RFC8890]

SWAN.co provides a legal framework to achieve the goals of all good actors who seek to provide people with choice, meaningful alternatives to choose from and hence enable them to reserve their privacy. The position taken by SWAN.co mirrors that of the UK’s ICO and CMA.

It is essential that this group can debate technical, legal and economic issues if fully thought through solutions to problems are to the found.

but the main thrust of SWAN seems to be to bypass existing protections.

The main thrust of SWAN.co is to improve people’s privacy and provide them with choice on the web which I believe is the focus of this group and the W3C. What are the “existing protections”? Is this a reference to existing ways that browsers are protected in becoming enhanced controllers rather than user agents?

SWAN.co’s initial technical implementation does not bypass anything. I’m not even sure how one would bypass computer code implemented in a browser. It builds on long established standards of interoperability to share data among legally bound data controllers and processors in support of legal business operations. Other commentators such as @TheMaskMaker acknowledge this.

I don't see SWAN as trying to bypass privacy, I see that as trying to enforce it.

@johnwilander Could you provide a link to information that this issue is trying to solve and the problems you see which require “bounce tracking protection”? This might then help us establish use cases that will be impacted.

It is the intention of SWAN.co to make a proposal to establish data sharing within the web browser based on data controller and processor relationships in the future. See the following slide from the presentation provided to the Improving Web Advertising (IWA) BG on SWAN.co in April.

image

Privacy by design principles have been followed including the use of pseudo anonymous identifiers that support the right to be forgotten.

Based on recent TAG reports on First Party Sets (FPS) such a proposal will require the security architecture of the web to evolve to recognize the legal relationship between controllers and processors rather than just domain = single data controller. Google and others who support FPS all appear to be seeking a similar change. Perhaps @hober who chairs this group and is a TAG member could comment on current TAG thinking in this regard? Has thinking evolved since your blog of October 2020?

Given my experiences of the W3C it seems like the required work from TAG, then the subsequent proposals, debates, trials and then mass deployment will take at least 24 months.

The publishing and advertising industry and their suppliers are working towards an effective deadline of October 2021 to no longer rely on so called third-party cookies. If they cannot find a solution open web publishers will be further threatened as advertisers direct their spend to more stable, less risky and high ROI alternatives. This is explained in the CMA report I referenced earlier.

Solutions are needed that will work today at scale. It would be helpful if Apple can confirm that they will not interfere with the long-established primitives defined in long established technical standards that support the legally compliant SWAN solution until such time a viable alternative is defined and widely deployed? Perhaps representatives from other browsers could also provide similar assurances to enable a rationale and productive debate to occur? No one responds well when the industry they work in or support is being threatened without a rational or legal basis and they are being discriminated against based on the actions of a minority.

@darobin

It's an established fact that interruptive prompting for data, like SWAN does, does not lead to people making informed decisions. Preventing such tricks from being used aligns with user interest.

I’m confused by this comment. SWAN.co interrupts the user once and so long as the data is retained by the web browser it will not do so again until 90 days have elapsed. People can update their preferences at any time, or using private mode wipe them when the browser closes. Where are the “tricks”? I see none. Compared to current practices that lead to “consent fatigue” SWAN.co seeks to avoid interrupting users every time they visit a website and explaining complex laws and many data controllers, processors and purposes; SWAN.co’s approach seems preferable. No one is forcing you or any publisher or advertiser to use SWAN.co. It is important to ensure those that wish to use SWAN.co can do so freely.

Without SWAN.co the alternative will be for website operators to ask people to provide directly identifiable personal information, such as email addresses or logins, every time they visit a web site. Not all website operators will have the scale or brand presence for such a solution to be remotely viable for them. In any case I don’t understand how that will yield a privacy improvement in practice.

Until an alternative is found SWAN.co, or a solution like it, will be needed by many participants in the web eco system.

@TheMaskMaker you make some interesting points. Competition issues in digital markets and self preferencing are all related to the W3C antitrust guidelines. I would like these urgently updated to protect participants in debates such as this one. I have raised an issue in relation to the W3C process which is currently with the W3C AB. You might wish to direct your comments to that issue. I agree with @johnwilander that we should focus this thread on the issue of so-called bounce tracking and the impact of any changes.

I am stating that we should not encourage choice architectures that work against users. This isn't new, it's been a constant area of improvement for the Web over the past two decades, going back to the ActiveX security model that gave users the "choice" to install viruses from the Web. The goal is not "choice", the goal is increased user agency. Choice will often decrease agency, particularly when it is interruptive or decontextualised — exactly as it is with SWAN (and making that a sticky decision makes it worse, since a decision made under a bad architecture is thus rendered persistent). Consent is useful where appropriate, but it should be rare, slow, difficult, specific, and very temporary. The typical case is consent to the processing of sensitive information provided as part of the collection form, and only for a specific case. Consent to persistent, non-specific, pervasive, and durable processing does not meet these criteria.

Yes, big tracking companies are a problem. It's not a problem that will be solved by maintaining the same status quo that brought them into existence. "Let's do more of exactly the same" does not strike me as a particularly promising plan.

I don't have a magic pony for "how small publishers or new startups ought to be able to compete against larger, established companies" for the very simple reason that no one does. But I do know that we can make the market support more competition or less competition. There are network effects in the use-value of data such that sharing it more disproportionally favours those who already have data. Today a small publisher (and we are all small publishers in the current game) is devalued because it cannot develop scarcity for its audience. We gave your way a try — it really, really failed. It's time to change that.

When a user intentionally clicks a link to view content on a different site, the browser should of course allow top-level navigation to occur. If the user has visited the site they're navigating to before, and, depending on the jurisdiction, has consented to being remembered, it seems reasonable for the site to access data related to the previous visit before displaying meaningful content to the user. The ability to access storage data partitioned to the origin/site upon top-level navigation is required for this flow.

Use of this mechanism for anything else, I believe, is an unintended side-effect of the functionality. There's especially a strong argument for user agents to consider bounce-tracking, or the top-level redirection through several domains that display no meaningful content for the purpose of data collection, an exploit, no matter how good the intentions are.

Hi everyone-- one of the Privacy CG chairs here. As mentioned in my last comment, this issue is dedicated to discussion on in what ways bounce tracking allows passive cross-site communication and potential technical solutions for addressing it.

At this point, we're veering way off topic. These side conversations are all reasonable to have in other forums, but not on this issue specifically. We'll unlock this issue early next week after which point we hope to see folks respect the request to remain on topic.

If you have concerns, please reach out to the Privacy CG chairs at group-privacycg-chairs@w3.org.

Thanks!

A Defence Against Bounce Tracking

While it may be possible for user-agents to recognise bounce-tracking, generally the use of redirection to hide cross-domain storage access, and block the servers that use it, the "arms-race" is bound to continue.

Determined trackers will develop more obscure and sophisticated techniques and, while browser companies may be prepared to continue developing defenses, the resulting side-effects can sometimes end up damaging the web platform for the wider ecosystem.

There needs to be a simple and more direct way to ensure users are protected from cross-domain tracking.

One way to do this, as pointed out at the top of this thread, is to purge all a top-level site's data, both data local to the top-level origin and partitioned to its embedded subresources, after the user stops interacting with the site.

Since no state information is persisted outside the expiry "window", all tracking techniques based on client-storage would be impossible.

Fingerprinting attacks may still be possible of course, but that is another story, and defenses are already being built up against that.

There already exists an API, Clear-Site-Data, whereby servers can request via a response header that all their client-side data is deleted, and user-agents could simply arrange to execute this action some default time after the user stops interacting with a site.

A timer would be (re)started on every user interaction, and data purged on the next visit after it completed, as if the browser had received a Clear-Site-Data header on the next access to the top-level origin. All the client-accessable site data can be removed this way, cookies, local and session storage, indexedDB storage, execution contexts and cache. Sites could then request that the browser extends the default timeout, which it would only do after prompting the user for consent.

The default duration should be a value low enough to make the persisted state inadequate for tracking, but high enough to allow inter-site navigation and persist other expected state for average user interaction within a single "session".

Sites with functionality requiring top-level state to be persisted for longer would be able call for it using a suitably extended version of the Storage Access API requestStorageAccess() method, with an additional parameter to convey a new expiry duration, only to be taken account of when the method is executed within a user gesture, and only in a top-level browsing context.

As in the case of embedded browsing contexts requesting access to their own top-level data, the user-agent would prompt the user and ask for their consent before implementing the new duration.

An argument against implementing a default restriction on first-party storage like this would of course be the risk of breaking existing functionality, but a transition could be managed by only triggering the new limit on first-party storage when a
Permissions-Policy: request-storage-retention=() header is present in the response to a top-level origin request.
The permissions capability is already proposed for Storage Access, (but for now only so that embedded contexts can request access to their first-party cookies with the request-storage-access feature).
This would ensure that only sites that were prepared for the new storage access functionality would be subject to it.

A useful amelioration would be to allow a longer persistence for defined low-entropy "session" state data, which could be used to stop the rendering of "cookie consent" or other banners or popups once a user had rejected them. This would have to be carefully designed to avoid it being employed as a persistent tracker, for example in could be a single named cookie with an entropy-restricted value and a moderately short duration, which would then not ne subject to the default data purge action.

An example could be:
Set-Cookie: __state=A;Max-Age=70000
where the name is a fixed constant and the value a single hexadecimal value.

There could only be one of these cookies per origin because of the risk of carrying high-entropy values in a combined set of low-entropy names.

Other exemptions for storage not to be subject to the purge might be added so necessary, but privacy-preserving, functionality might be retained - always with the proviso of course that they were not capable of being co-opted by "bad actors" for tracking.

In preparation for the F2F discussion I'm trying to find information on the problem that is attempting to be solved in this issue. @johnwilander please could you provide a link to this information?

In preparation for the F2F discussion I'm trying to find information on the problem that is attempting to be solved in this issue. @johnwilander please could you provide a link to this information?

The technical description of the problem in its original form is in the issue post under “Bounce Tracking.” If you are looking for a description of what cross-site tracking is, that’s out of scope for this issue but the threat model work in PING may be able to help you. If you are looking for information on why browser vendors want to prevent cross-site tracking, I’ll have to defer to each vendor’s tracking prevention policy.

I think you are referring to this privacy threat model document. Correct? The document is labelled as follows.

Not Ready For Implementation
This spec is not yet ready for implementation. It exists in this repository to record the ideas and promote discussion.

I was hoping for concrete details of actual harms that you are seeking to address with this issue and who the perpetrators of those harms are. Does such support exist?

I see the start of a pull request to outline the different positions. Perhaps as a next step this could be expanded to outline the specific harms and actors in relation to bounce tracking?

In relation browser vendor's policies, I don't see how they are relevant to this group or this issue. We're focused on improving privacy in practice not coordinating the implementation of various web gatekeepers commercial policies.

I was hoping for concrete details of actual harms that you are seeking to address with this issue and who the perpetrators of those harms are. Does such support exist?

The issue does not need to state "actual harms." There are browser vendors who want to prevent cross-site tracking and bounce tracking is an instance of cross-site tracking. Hence, it's useful to discuss if and how we can standardize prevention of bounce tracking.

If you don't want to discuss how to prevent bounce tracking but rather discuss something else, I suggest you open a different issue in a suitable group within W3C. This is a work item proposal to jointly work on how to prevent bounce tracking.

The charter of this group states the following as the mission.

The mission of the Privacy Community Group, motivated by the W3C TAG Ethical Web Principles, is to incubate privacy-focused web features and APIs to improve user privacy on the web through enhanced browser behavior.
The group welcomes participation from browser vendors, privacy advocates, web application developers, and other interested parties.

As someone who is part of three of these four participation groups, I’m keen to understand how this proposal, like any other proposal in Privacy CG, improves user’s privacy in practice.

@johnwilander

There are browser vendors who want to prevent cross-site tracking and bounce tracking is an instance of cross-site tracking.

Those of us that are not browser vendors need to know very clearly why browsers want to prevent cross site tracking? We need to understand justifications beyond individual corporate policies or nebulous references to privacy so that this issue can be debated rationally by all four participating groups.

However, if this group is in fact a forum for web browser vendors to co-ordinate their changes then as a minimum the charter should be revised.

@michael-oneill I may have misunderstood the proposal but it seems like this has the substantial potential to severely break most identity single-sign-on flows. For example, both SAML and OpenID Connect use redirect mechanisms with link decoration to support single-sign-on across disparate domains. These flows can contain explicit user behavior (i.e. consent messages) but can also be silent (e.g. the user has already given consent to share their identity between the IDP and the site).

If I understand the proposal correctly, then it would be easy for the IDP to be classified as a "bounce tracker" and all the user state cleared from the IDP site. This would not only force the user to login much more frequently creating a poor user experience but it would exacerbate it by removing all trust the IDP has with the browser and the user forcing the user's sign-in experience to be even more complicated (e.g. password + SMS verification).

The reason is that IDPs today store some state in the browser instance regarding userX signed in to the IDP. This then enables the IDP to assert some level of trust to the browser instance. If all such state is removed, then the IDP MUST treat the sign-in attempt as "un-trusted" and use the more complicated user authentication flows.

Are there provisions in the proposal to support identity providers? Or do you have any specific recommendations?

@gffletch the data purged by my proposal would only be top-level + partitioned, the non-partitioned subresource data would not be affected. The existing Storage Access API could still be used to let SSO subresources see their (first-party) cookies etc.
I think credentials needing to be stored in the top level could be given an exemption (from being purged) by the browser recognising the SSO flow, I will give that some thought.

First, @michael-oneill and @gffletch, what you are discussing is already an active topic in the IsLoggedIn work item. See https://github.com/privacycg/is-logged-in#why-do-browsers-need-to-know and privacycg/is-logged-in#15.

Second, there is no difference between first-party cookies and the cookies a third-party gets access to though the Storage Access API. So I don't understand the distinction you make between "top-level" and "non-partitioned" data, Michael. Are you suggesting data is hidden from the first-party and only accessible when a third-party requests it through the Storage Access API?

Thanks John. By "top-level" I mean storage accessable to the top-level frame, e.g. the sites own first party cookies and everyting controllable by Clear-Site-Data,.
Currently the SAA allows embedees to request access to their own first-party cookies, I am suggesting that top-level data (+ embedee data partitioned to the top-level site) is purged after a no-activity timeout, and for the SAA to be enhanced so that top-level contexts can request the timeout be extended (by triggering a user prompt via an extension of requestStorageAccess).
I also suggested a transition period when the purging would only apply if sites had an appropriate Permissions-Policy response header.
I was saying to George that my proposal does not change the situation for SOO, and I agree isLoggedIn is a better #issue for that, though work there could lead to an exemption (from being purged) for credentials needing storing in the top-level.

pgl commented

Hi there. I hope it's OK to chime in here.

How about some sort of way of notifying users that bounce tracking is occurring? It wouldn't even need to require user interaction to allow (although perhaps that could be configurable), just some way to tell people it's going on and the option to proactively block sites if wanted.

I think in the case of SSO authentication, this wouldn't need to happen often so the notifications wouldn't be too annoying. Perhaps there could be an option for the user to say "this is OK from that site". In other cases, especially when it's happening a lot, informing the user would at least tell them something weird's happening.

Detection of all cases would be easier than trying to selectively block good and bad cases. It doesn't "solve" the issue but it goes some way towards giving the end user more awareness.

There are cases where I want to redirect to a new URL due to service termination or service integration with another company. This is a legitimate redirect use case that does not involve tracking.

However, in the case of a company that operates many services using subdomains, the number of redirects from the same domain will be large, and the company will be limited by the bounce tracking protection even though it is not tracking.

In portal sites, there are often domain moves of services. The redirector count of 10 times poses a risk, and I think we need a method to detect legitimate uses other than ad hoc.

So, we would like to ask that redirects that do not involve tracking should not be counted as part of the bounce tracking protection.

For example, the following.

@johnwilander is everything in this proposal captured by the Navigational-Tracking Mitigations work item or is there something separate here that we should continue to discuss via this proposal?