A repository for exploring how adversarial machine learning can provide privacy against face recognition.
preprocess_vggface.py
has two modes of operation:- both rely on the VGG Face dataset being under
image_directory
organized asidentity/image.jpg
preprocess
reads in the VGG Face-downloadedbbox_file
, crops the images, prewhitens them, and saves them all under one giant.h5
file with pathoutput_directory/nXXXXXX/images.h5
write_embeddings
generates embeddings from the model and the images written before and saves them under a giant.h5
file with pathoutput_directory/nXXXXXX/embeddings.h5
- the mode of operation needs to be specified with the
--op
option - if a file of subsampled identities (in the format generated by
sample_identities.py
) is provided with the--sampled_identities
option, then only those identities and images are processed - see
run_preprocess_subsample.sh
for an example of using this script twice to a) generate the preprocessed images and then b) compute and write the embeddings
- both rely on the VGG Face dataset being under
run_adversarial_attacks.py
generates adversarially perturbed images and saves them underoutput_directory/nXXXXXX/attack_type/epsilon_XX.h5
. The resulting file has two datasets:embeddings
andimages
self_distance
: push embedding as far away from where it original was as possible- resulting images stored as
VGG_BASE/test_perturbed/nXXXXXX/self_distance/epsilon_XXX.h5
- resulting images stored as
target_image
: pick an image of a different subject and use its embedding as a target- resulting images stored as
VGG_BASE/test_perturbed/nXXXXXX/target_image/epsilon_XXX.h5
- resulting images stored as
- [DEPRECATED/ABANDONED]
random_target
: sample a vector in the output space at random to use as the target community_naive_same
: modify all images not belonging to identity A so that they embed to the same, randomly sampled vector corresponding to A- resulting images stored as
VGG_BASE/test_perturbed_sampled/nXXXXXX/community_naive_same/nYYYYYY/epsilon_ZZZ.h5
where theXXXXX
identity is the ground truth and theYYYYYY
identity is the target
- resulting images stored as
community_naive_random
: modify all images not belonging to identity A so that they embed to a randomly chosen vector corresponding to A (target is resampled with repetition for each image that is to be adversarially modified)- Note: When evaluating this attack it is important that we do not include images that were sent to the query image in the negative set because the adversary (the community seeking to preserve privacy) should not have access to the query image or its embedding. Therefore, when this attack runs, the adversarial
.h5py
datset (epsilon_{}.h5
) also includestarget_indices
. Those are then read by the recall computation function inutils.py
and interprteted as indices into the clean embeddings of identity A. So if identity B's image at indexn
was modified to match the embedding of identity A's image at indexm
, the dataset atoutput_directory/B/community_naive_random/A/epsilon_XX.h5
should contain atarget_indices
array where then
-th element ism
.
- Note: When evaluating this attack it is important that we do not include images that were sent to the query image in the negative set because the adversary (the community seeking to preserve privacy) should not have access to the query image or its embedding. Therefore, when this attack runs, the adversarial
community_sample_gaussian_model
: modify all images not belonging to identity A so that they embed to a vector randomly sampled from a Gaussian with mean and standard deviation matching the vectors of A
In general, we are following the instructions here.