Description: this demo is for CS294 (Privacy-Preserving Systems).
This tutorial builds a training and testing pipeline for a toy ML prediction problem: to predict whether a passenger in a NYC taxicab ride will give the driver a nontrivial tip. This is a binary classification task. A nontrivial tip is arbitrarily defined as greater than 10% of the total fare (before tip). To evaluate the model or measure the efficacy of the model, we measure the F1 score. This task is modeled after the task described in toy-ml-pipeline.
The purpose of this demo is to demonstrate how we have incorporated information flow control techniques to help developers retract data from customers who request data deletion. In this demo, we:
- Run training pipeline on Jan 2020 data
- Run inference “weekly” from Feb 1, 2020 to May 31, 2020
- Delete user_109 label (not used in training)
- “Weekly” inference will still run successfully
- Delete user_139 label (used in training)
- Use 30-second threshold (default is 30 days)
- “Weekly” inference will throw errors
-
vary number of labels, measure runtime & space
-
vary number of deleted labels, measure runtime & space
-
on committing new labels, vary cardinality and measure runtime
-
on propagating through pipeline, vary cardinality and measure runtime
- clean up deletion experiment
- run each experiment many times
- put in paper