This project is part of a larger investigation focused on detecting privacy leaks in a human-written text in social media. In this case, we aim to detect preferences and things that a person does or likes, using text conversations of the subject. The repository presents :
- Annotated dataset
- ML models to recognize defined Entities
Although below is a little more context, the project is still a bunch of investigation notes and tests. If you wish to know more about the investigation feel free to contact the author.
Text data was taken from Topical-chats. The conversations there, are very similar to the style of social-networks such as Reddit. The sentences were annotated using the web tool Webanno and the below Scheme
- Subject
- Preference
- Activity
- Object
Examples:
Various steps for cleaning, organizing, and formatting data were made and can be found in scripts and corpus_porcess file