Quantifying privacy loss in social networks
Proposal:
The topic of this thesis is to understand the information content of a user based on what they publicly share. The study comprises of integrating several social network platforms; currently Twitter, Foursquare and Instagram and understanding the effect of sharing some content in one of these networks on the subsequent inferences that can be made on other social networks.
For example, given user's Twitter timeline, are we able to infer their which type of venues are they most likely to visit? And furthermore, if at some time this was not possible, would we be able to do it at a later time using their new tweets? Can we infer the average a user spends on restaurants based on their instagram hashtags? Where has the user traveled to and where are they going next? Answers to all these questions contain value, in particular, for the latter this would carry advertising poten- tial. Furthermore, by reconstructing and aggregating the trails left by a user in common everyday internet usage, how much insight can we gain on a user's life?
The problems we want to explore in this thesis is: how to quantify the value of information - in terms of improved prediction performance and monetary value - and to what extent, integrating the publicly available information from different social networks can give an accurate model of the user.
Stages:
- Data collection [x]
- Baseline inference models [x]
- Progressive inference models [ ]
- Defining information content metrics [ ]
- Attributing monetary value to information [ ]
For more detailed description of inference tasks, dataset and further steps refer to the wiki!