WICG/turtledove

Using Aggregated Data for Machine Learning without (Unlabeled) Granular Data

mbowiewilson opened this issue · 2 comments

In Criteo's privacy preserving machine learning competition, data scientists from Meta showed that logistic regression models can be trained with granular unlabeled data and aggregate labeled data. (They also used granular labeled data for model validation and hyper-parameter tuning, but let's ignore that for the sake of this issue.) After that competition, I felt optimistic about the feasibility of using data from the aggregate reporting API to optimize bidding with machine learning on behalf of advertisers. However, I have my doubts after thinking this through a bit more. The reason is in TURTLEDOVE (after granular data from FLEDGE is phased out) we won't get the proper granular unlabeled data needed for Meta's winning approach to work.

To illustrate this issue let's consider training a click-through-rate prediction model, which is probably the most fundamental component of how ad buyers value ad opportunities. To train such a model with Meta's approach we'd need two things. First, we'd need aggregate labeled data pertaining to impressions and clicks, which as I understand TURTLEDOVE will enable. Second, we'd need granular impression data (whether labeled or not). However, my understanding is no such granular impression data will be available.

DSPs do have a few sources of granular (unlabeled) data in TURTLEDOVE such as events on advertiser sites, contextual requests from TURTLEDOVE, and interest group requests from TURTLEDOVE. However, none of these types of unlabeled data are applicable to the click-through-rate prediction task I am interested in here. Unfortunately other entrants in Criteo's PPML competition (such as a team I was on from NextRoll) relied even more heavily on granular data than did Meta so its not the case that we can just fall back on a different, somewhat less successful approach to using data from the aggregate reporting API. As far as I know competition entries that relied purely on aggregate data (we worked on some of these at NextRoll) performed terribly compared to approaches that used even a tiny amount of granular labeled data.

So unless I am missing something, I think we are sort of back to square one in terms of how DSPs are supposed to optimize bidding using machine learning on behalf of advertisers in TURLEDOVE when FLEDGE's granular data gets phased out.

Hi, my name is Andres and I am leading a research effort on private ML modeling.

The problem raised here is a valid concern and one of the reasons for which we are still allowing event level reporting to happen on FLEDGE. As you mention, even after removal of event level reporting, there are still some sources of granular data. Indeed, granular contextual information will still be sent in the ad request and interest group information can be observed when a user joins the interest group. The challenge of course, is that these sources of information are never observed together. We are currently exploring ways to use this event level information. Some options worth considering are

  1. Train factorized models that leverage contextual information (which consist of labeled unaggregated data) combined with aggregate interest group information multipliers.
  2. Work with graphical models that can be trained using only aggregate information such as Naive-Bayes or generalizations of this model.
  3. Evaluate the effects of training differentially private models (via the use of DP-SGD for instance).

Option 3 is a form of aggregation that would require a more sophisticated trusted server and we are looking into the viability of such servers. As we evaluate these options we would appreciate any suggestions on other possible training mechanisms.

Hi,

I think it's important to note that the setting in the Criteo Challenge was more restrictive than FLEDGE + aggregate reporting (as proposed e.g. in #194). More specifically, the challenge used a statically generated dataset - in FLEDGE, on the other hand, it would be possible to dynamically compute the aggregatable reports emitted in each generateBid invocation. (For example, in generateBid it's possible to calculate the model's gradient.)

This additional flexibility in reporting allows for an entire class of approaches that were impossible to submit in the Criteo Challenge. While I don't have a definitive assessment of the final performance of such methods in the FLEDGE setting, as we are experimenting internally I'm optimistic that a "sensible" model could be optimised this way.

Best regards,
Jonasz