Secure and Private AI Scholarship Challenge

This is a repository with my work and notes from the Udacity ML&AI Challenge sponsored by Facebook. I participated in this challenge in the summer of 2019.

The course is taught by Mat Leonard, the head of Udacity's School of AI, and Andrew Trask, the lead of OpenMined.org and researcher at DeepMind.

Two repositories that hold learning resources used in this course are:

Content

lesson 2 concerns an intro to PyTorch
lessons 3 to 6 is on differential privacy and PATE analysis
lesson-7 is on federated learning
lesson-8 is on securing federated learning
lesson-9 is on encrypting deep learning
PATE MNIST Analysis is a proposed project on using general differential privacy to train an unlabaled dataset

PATE MNIST analysis

This project concerns using general differential privacy to train an unlabeled dataset using predictions from interface-compatible models.

Task: Assume we have a private unlabaled dataset and access to external models trained to similar datasets and with a common interface. We want to train our model on our dataset using the predictions of the others for labeling, while keeping the external models's data and our own within some privacy parameters.

Here we simulate the whole process, from the training of the external datasets to our own and measuring the epsilon-privacy using PATE.

Frameworks used in this project:

pytorch
numpy
matplotlib
pysift

Following, there are some code snippets highlighting the method:

# Detecting and using CUDA for training, if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
images, labels = images.to(device), labels.to(device)

# Simulating several separated initial datasets by splitting them equally
indices = list(range(dataset_size))

np.random.seed(random_seed)
np.random.shuffle(indices)

indices = list(split(indices, no_datasets))

# Creating data samplers and loaders:
samplers = list()
for i in indices:
    sampler = SubsetRandomSampler(i)
    samplers.append(sampler)

train_dataloaders = list()
for s in samplers:
    loader = torch.utils.data.DataLoader(dataset_train, batch_size=32, sampler=s)
    train_dataloaders.append(loader)

# Training a model
model = Network(784, 10, [512, 256, 128])
optimizer = optim.Adam(model.parameters(), lr=0.001) # low learning rate to simulate real life
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
    
acc = train(model, train_dataloaders[i], dataloader_test, criterion, optimizer, epochs=1)

And training sample output:

Epoch: 1/1.. 
Training Loss: 1.9363..  Test Loss: 1.0918..  Test Accuracy: 0.6603
Training Loss: 1.0398..  Test Loss: 0.6290..  Test Accuracy: 0.7999
Training Loss: 0.7456..  Test Loss: 0.4959..  Test Accuracy: 0.8483
Training Loss: 0.6424..  Test Loss: 0.4478..  Test Accuracy: 0.8605
Training Loss: 0.6003..  Test Loss: 0.3816..  Test Accuracy: 0.8854

# Analysing data dependent and independent epsilon using PATE
data_dep_eps, data_ind_eps = pate.perform_analysis(teacher_preds=np.asarray(predictions), indices=consensus, noise_eps=eps, delta=1e-5, moments=4)

vanntile/secure-and-private-ai-challenge

Secure and Private AI Scholarship Challenge

Content

PATE MNIST analysis