PHINE: Private Heterogeneous Information Network Embeddings

Heterogeneous Information Networks (HINs) are a promising alternative for representing clinical events that contain inter-related and multi-typed data, such as patient data and their relationships with medical diagnoses, description of symptoms, anamnesis, and observations. Network embeddings methods propose mapping an information network to a latent space (i.e., embedding space) to preserve the structure in a low dimensional vector space, thereby enabling the use of machine learning methods based on vector-space models. However, since most network embeddings methods do not consider strategies for omitting users' private features, adversaries can use embeddings to infer sensitive user information. Moreover, recent proposed methods are suitable only for homogeneous networks. We propose the Private Heterogeneous Information Network Embeddings (PHINE) approach for privacy-preserving heterogeneous network embedding for clinical events. We explore Graph Autoencoders (GAE) with an objective function that simultaneously maximizes the embeddings' usefulness for classification tasks (i.e., preserving HIN properties and topology) and minimizes the effectiveness of inference attacks from embedding (i.e., hiding private information). To the best of our knowledge, this is the first privacy-preserving approach on clinical events data for heterogeneous networks. The experimental results reveal that PHINE presents a competitive trade-off between privacy-preserving and utility feature prediction.

We use Synthea (available here) demo dataset for the experiments and, our approach, is based on Adversarial Privacy Graph Embeddings (available here)