MIARS: A Python repository from minxingzhang

Membership Inference Attacks Against Recommender Systems

Minxing Zhang, Zhaochun Ren, Zihan Wang, Pengjie Ren, Zhumin Chen, Pengfei Hu, Yang Zhang

ACM Conference on Computer and Communications Security (CCS) 2021

Introduction

There are two .py files which are our attack models. One is based on a clustering algorithm. And the other is based on the deep learning technique.

And there are three types of datasets, i.e., "Interactions", "Recommendations" and "Vectorizations".

"Interactions" is formatted as: UserID \t ItemID \t Scores \n
"Recommendations" is formatted as : UserID \t ItemID \t Scores \n
"Vectorization" is formatted as: Vector[i][1] \t Vector[i][2] \t ... \t Vector[i][m] \n (Here, $m$ is the dimension of the feature space, and $i$ means this feature vector corresponds to the $i^{th}$ user.)

Note that, to balance data, the first half of Interactions and Recommendations are for Members while the second half of them are for Non-Members.

Here, we will provide the guidelines to construct datasets for the attack:

Datasets for "Interactions" are derived following the steps:

To randomly divide the original dataset into 3 subsets for Shadow Model, Target Model, and Vectorization.
For each subset, filter the users with less than 20 records.
To relabel users and items by consecutive numbers.
To store records using the above format and to sort them by user number.

Datasets for "Recommendations" are derived from corresponding recommendation systems (Item-based Collaborative Filtering, Latent Factor Model, or Neural Collaborative Filtering), which are stored in the above format.

Item-based Collaborative Filtering aims to find the similarities among items, which is based on the user behavior on items. For instance, if a user bought item A and item B at the same time, the items A and B are more related. Then, according to the calculated similarities, the recommender systems could provide users with items most relevant to what they have interacted with.
Latent Factor Model aims to find latent factors that can represent both item attributes and user preferences. Specifically, the user-item matrix is decomposed into two lower-dimensional matrices. This lower-dimensional space is composed of latent factors (as bases). In that case, the predicted preferences of users to items are the product of these two matrices, so that recommender systems can easily select recommendations for users.
The implementation of Neural Collaborative Filtering follows this work.

Datasets for "Vectorization" are derived following the method in the paper.

Dataset Construction

The implementations of dataset construction can be found here.

Reference

To acknowledge the use of our work, please cite our paper:

@inproceedings{zhang2021membership,
  title={Membership inference attacks against recommender systems},
  author={Zhang, Minxing and Ren, Zhaochun and Wang, Zihan and Ren, Pengjie and Chen, Zhunmin and Hu, Pengfei and Zhang, Yang},
  booktitle={Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security},
  pages={864--879},
  year={2021}
}

minxingzhang/MIARS

Membership Inference Attacks Against Recommender Systems

Introduction

Dataset Construction

Reference