WICG/turtledove

Addition of an analytics/reporting entity to enable centralised reporting

warrrrren opened this issue Β· 15 comments

Context

Publishers today can rely on their SSP/Analytics partners to provide detailed reporting on several metrics and analytical tooling outside of the reporting provided by their ad server.
e.g. Prebid supports analytics adapters that receive events notifying them of bid initiation, bidder adapter bids, timeouts, etc. allowing for data analysis in the analytics vendors interface.
The current PA-API implementation doesn't allow for more than one entity (the top-level seller) to view and thus report on all the component and top-level auctions.

Proposal

The PA-API adds an analyticsURL key to the auction config that takes a URL to a hosted javascript file as a value.
The hosted file can contain a function that is called when each participating component auction and the top-level auction's reportResult() function is called.

This function's input schema and restrictions can be similar to the existing reportResult().

Reference to the relevant section of the Chrome response to the IAB fit gap analysis: https://docs.google.com/document/d/10608Tp57alonCiBN9D2-0UfV_C6FFLsV5nh6sBaA5rA/preview#heading=h.uf1ddn79iizl

Specific item on page is "Publisher Revenue Accrual and Impression Validation"

If we do this, we may need to introduce some way for auction participants to indicate they are OK sharing information with the publisher (for the bidder/seller relation, participating in the auction is considered implicit permission to share the bid, but this change is very much changing who can view information previously private in Protected Audiences with a 3rd party), and we'd need to extend the aggregate reporting to include the publisher.

I think we'd only want to call this new putative method once during the reporting phase, not once per reportResult() call? Also note that reportResult() is only called on the component seller with the winning bid (And the top-level seller), not for each of the other component sellers.

Hi Warren, could you please add your name and affiliation to your GitHub profile?

I suspect we need a little more work to define what this reporting entity is trying to do and what information it should get. As Matt says, only the winning component auction gets to run reportResult() now. What kind of reporting goals are you looking for in multi-seller auctions? What about cases where the whole PA auction does not produce any winner at all? And crucially: Are some of these goals better met with more data but only aggregate reporting?

For the permissioning issue that Matt brings up, it seems to me that it's sufficient for the seller to be able to see the auction config (including any new reporting endpoint) in scoreAd. If the seller doesn't agree to disclose information to the publisher's choice of reporting endpoint, they can of course decline to have any winner from the auction. But I don't think the seller should be able to say "I will run this auction but refuse this reporting"; the publisher should be in the position of making it a take-it-or-leave-it kind of deal.

(We would need to decide what happens if the reporting endpoint fails enrollment and attestation: does that make the whole auction fail, or does the auction happen without the 3p reporting?)

I'm worried about a different permissioning question, though: does the presence of a reporting endpoint in the auction config constitute a good enough proof that that really is the publisher's intent? Warren is asking for "an analytics/reporting entity", so not necessarily something going to the domain of the publisher page itself. I don't think we want anyone who gets a chance to run JS on the publisher page to be able to install themselves as if they were a publisher-blessed reporting endpoint. Seems like we should ask for something like a .well-known file on the publisher domain that explicitly lists blessed reporting endpoints.

I'm skeptical that adding the information to the auctionConfig passed to the seller is enough here - I think it's reasonable to require only the seller provide consent (The seller already has that information, so can decide whether or not it's ok to pass it on), but we're basically passing information solely available able to scripts with their own memory spaces to another origin in a way that may come as a surprise to sellers. While I defer to more security-minded folks on whether we need explicit opt-in here, I think we do (just an extra "yes, it's ok to send info to this publisher origin" field would be enough, and we would ignore the bid, otherwise).

Publisher API

part 1: Publisher Monitoring API

I would like to share some ideas and thoughts on the current issue. I have attempted to make my writing brief, concise, and precise. My goal is to contribute meaningfully to the discussion. Unfortunately, it's a bit long.

TL;DR

This issue discusses the Protected Audience API (PAAPI), focusing on the needs of publishers and offering a solution for better transparency and control. The problem statements include lack of transparency, real-time detection, context/data, and control. The proposed solution is a two-part API: a Publisher Monitoring API for collecting and analyzing performance data while ensuring user privacy, and a Publisher Configuration API for setting rules related to monetization. The aim is to align with the privacy sandbox's objectives and set higher industry standards based on transparency and trust.

Overview

The Protected Audience API (PAAPI) is designed around two primary concepts: buyers and sellers. The term "seller" can represent various stakeholders or interests, such as an SSP, an ad server when acting as a top seller, or a property owner (i.e., the publisher).

The points that follow are focused solely on the publisher's needs and aim to provide additional context to the current GitHub issue.

⚠️ This issue doesn't focus on browser-side troubleshooting, nor is it a monitoring API for sellers or buyers, or an event-based JS API like Googletag's SlotRenderedEvent.

Problem statement

(1) Lack of transparency. There's no clear source-of-truth in the PAAPI that allows publishers to independently verify and control that everything functions as expected. The most common use cases include:

  • "Revenue Accuracy": This refers to the consistency between what the SSPs and the Ad Server report versus what happens on the browser side. It's essential to ensure accurate billing and prevent any dysfunctional practices between stakeholders.
  • "Seller Performance": Building an ad stack demands many compromises from a Publisher. It's crucial for them to understand the impact and performance of the PA and its sellers. The ad stack (and therefore the PA) makes up a significant portion of JS code execution and network calls, which, in some cases, may negatively affect the SEO performance of the page but also the generated ads revenue, and ads viewability (fast-ads-matter).
  • "Auction Manipulation": Some sellers may encounter issues leading to abnormal bid prices that win the auction, even without payment. It's typical to see spikes due to technical problems or towards the end of the month.
  • "Last decision call". The top seller plays a crucial role as they ultimately decide which ad will or will not be displayed on the page. It's essential to independently audit these decisions. Currently, the top-seller's decision process remains a black box.

(2) Lack of real-time detection. For property owners, swiftly detecting issues and bugs in their current setup is crucial to avoid significant revenue losses. The most common use cases include:

  • "Technical implementation issues", such as a new release that impacts the PA API or the monetization code. Common errors can result from a developer bug, a breaking release of a library like Prebid, or incorrect settings for auction components.
  • "Seller issues" are more challenging to identify since they depend on external factors like seller-side bugs. Publishers need to detect this type of issue as quickly as possible to implement the right solutions. For instance, why has a bidder suddenly got a 0% bid rate? Identifying a wrong bid pattern, like a seller making a "fixed $99 bid" for several hours or days, is crucial.

(3) Lack of context/data. From a publisher's perspective, audiences are their most valuable assets β€” they are what publishers sell to the buy side.

  • "Segment performance": Publishers yield their inventory and need to attach first-party and publisher data. They need to understand their audience's performance concerning traffic sources (SEO, social, paid, etc.) and other dimensions. Based on this data, publishers make crucial decisions, such as promoting subscriptions or adapting their floor price strategy. All this information belongs to the publishers, and they don't want to share it with sellers, buyers, or other third parties.
  • "Content and contextual performance": Similar to segment performance, publishers need to analyze the performance of their content, site layouts, user experience, and so on. For instance, the publisher should be able to measure and compare the performance of two articles to identify specific factors.
  • "Benchmarking": The complexity of the stack leads to A/B tests, code monetization updates, library updates, and the addition of third-party partners, which can modify the auction behavior.

(4) Lack of control. The publisher has responsibility for the content, user experience, and ads displayed to the users. Concurrently, there are growing CSR initiatives to limit and optimize the supply path. Publishers need appropriate controls and tools to ensure everything functions as intended (such as type of ads, allowed brand domains, floor price, etc.).

Additional context

(1) Sellers β‰  Publishers

It's important to understand that publishers and buyers, or sellers, including top sellers, have different business interests. Buyers aim to purchase the right inventory or audience at the lowest price. In contrast, sellers and publishers strive to sell their inventory at the highest price. Sellers compete with each other to capture a larger portion of the inventory and have no incentive to optimize the entire publisher ad stack because it doesn't align with their interests.

While publishers are worried about the impact of advertising on their page, both buyers and sellers are less concerned with overall page performance or SEO challenges.

(2) Privacy Sandbox + User Privacy + Publishers

*The Privacy Sandbox has two core aims:

  • Phase out support for third-party cookies when new solutions are in place.
  • Reduce cross-site and cross-app tracking while helping to keep online content and services free for all.*

At the heart of their operations, publishers produce content for audiences. Their business thrives on understanding and valuing these audiences. When a page on a specific site is loaded, it's crucial for publishers to comprehend their users and the associated performance metrics. This understanding informs various decisions, including monetization strategies and product adjustments (content, subscriptions, user experience, etc.). However, typically, the user's existence is not recognized beyond the site.

It's crucial to understand that protecting user privacy requires adaptation among buyers, sellers, and publishers.

Lastly, publishers tend to be permissive, often implementing minimal checks on third-party partners, which can lead to data leaks (social pixels, user analytics, last fencing technology publishers should test, etc.).

Why it’s an opportunity to provide a solution

  • It is crucial to provide a solution that aligns with the privacy sandbox's initial objectives related to user privacy. This includes adding capabilities to support companies offering "online content and services free for all".
  • It is essential to provide the right tools and audit capabilities to ensure transparency, as this is vital for publisher and industry adoption. Introducing a framework perceived as "another brand-new black box" presents a significant limitation.
  • Header bidding and Prebid allow publishers to independently track (a part of) their revenue using client-side data. If a publisher reverts to relying on sellers' reporting, it represents a significant step backward in terms of transparency and independence.
  • This situation is an opportunity to set higher industry standards based on transparency and trust. It's also a chance to correct some dysfunctional practices. The browser is the only place where a framework can be built to meet all requirements (user privacy, industry constraints).
  • Lastly, it encourages better and fairer competition.

What could the Privacy Sandbox offer?

πŸ’‘ This section compiles ideas that could potentially form a solution. A limitation is my lack of knowledge about the PAAPI. I willingly admit that some use-cases described above might be addressed by the current APIs such as the private aggregation API and functions like reportWin or reportResult.

IMO, The issue can be split into 2 parts, which can be covered by an API

  • The Publisher Monitoring API offers a straightforward way to collect and analyze PA performance. It ensures user privacy standards are met and limits potential data leakage. I will detail some ideas below;
  • The Publisher Configuration API provides publishers with the necessary controls to set rules related to monetization and page configuration. Some of these APIs may be managed by tools provided by sellers (e.g., Prebid or Google Ad Manager), so they are not within the scope of this discussion.

Some toughs about building a solution

Requirements and constraints

I'm summarizing the requirements and constraints that need to be addressed to ensure that the new API doesn't compromise the promises of the Privacy Sandbox:

  • The Monitoring API should prevent cross-site and cross-app tracking.
  • The Monitoring API should safeguard the logics of both buyers and sellers, keeping them undisclosed.
  • The Monitoring API should not allow a seller or buyer to access another seller's information logic.
  • The Monitoring API should avoid exposing a public endpoint that would allow anyone on the page to access it.
  • The Monitoring API should have a negligible impact on browser performance, including both code execution and network load.

Key capabilities

  • The Monitoring API reveals metrics and dimensions that enable publishers to monitor PA performance using auction metrics, seller metrics, bid metrics, impressions, and revenue metrics. If necessary, I can compile a list of typically used dimensions and metrics that publishers use to track ad stack performance.

  • To meet requirements, access to the Monitoring API should be delegated on behalf of a publisher. It may be beneficial to include a Privacy Sandbox enrollment for each domain authorized to receive data. For instance,

    • (1) Publishers could identify the authorized domain using an header response, a hosted file on a domain (such as a .well-known).
    • (2) Domains (and partners) that are eligible for a Publisher's delegation must meet certain requirements to be a part of the Privacy Sandbox ecosystem. These requirements could include the Enrollment form for the Topics API or Interest Groups (IGs).
  • The Monitoring API should not expose a Javascript endpoint. For instance, Prebid has implemented a 'getEvents' function, but this approach allows anyone to access performance data. While it has its advantages, it also presents significant drawbacks. From a user privacy standpoint, it's overly permissive and could serve as a backdoor for data leakage.

  • Instead, we could consider making the PA responsible for sending "events" to the delegated domain. The WICG has to define the content of an "event". Reusing the concept of a "worklet" could be beneficial for the following reasons:

    • It allows a degree of personalization, enabling the publisher to format the events into their own "language". The provided code allows a publisher to enrich the "events".
    • A worklet only gives access to the events a publisher should have access to, ensuring that seller/buyer logic is never exposed to a publisher.
  • Since the Monitoring API doesn't necessitate "real-time" operation, most of its execution and network calls can be assigned a low priority. To minimize the number of network calls, events can be batched or even sent after the page has been unloaded. Additionally, the PA can enforce a timeout on code execution or HTTP requests to avoid heavy computations or latencies.

#1035 was a less elegant attempt at this same feature request

Hello,

I am seeking feedback from @MattMenke2 and @michaelkleber regarding this issue, particularly to gain insight into what is feasible or what may not align with the PAAPI objectives.

While awaiting their input, I have begun sharing the issue with top publishers in the EU. If necessary, I can request for them to share their current challenges and concerns to gather additional feedback from the sell-side and publishers.

Thank you for writing up your proposal. I look forward to talking about it on our weekly call, where this issue is the first item on the agenda!

Linking #430 which is also related.

@michaelkleber and @gpolaert Following up on the conversation we had last week, here's a scoped-down version of the proposal that only calls for aggregated reporting and reserved fenced frame events: Aggregate-Only Proposal

Note: This proposal doesn't directly support the real-time monitoring use case.

@michaelkleber Following up on our discussion a couple of weeks ago, I've created an updated proposal that utilizes the shared storage API:Shared Storage Based Proposal

@michaelkleber and @JensenPaul
Following up on our conversation this past Wednesday:

  1. I looked at #1190 and it doesn't seem strongly related to this issue.
  2. This is the updated proposal I mentioned: Shared Storage Based Proposal