Persona Classification

Medical social media is a subset of social media that is restricted to health care related topics. Different kinds of people (or personae) contribute to the medical social media - like patients, caretakers, consultants, pharmacists, medical researchers or medical journalists. The problem at hand is that for a given a blog post as an input, the system is expected to return which personae wrote that post.

Dependencies

This code is written in python. To use it you will need:

Python 2.7
skip-thoughts

Getting started

You will first need to download the model files, word embeddings and blog posts data (see below). The embedding files (utable and btable) are quite large (>2GB) so make sure there is enough space available. The encoder vocabulary can be found in dictionary.txt.

Dataset

blog-posts-dataset

Instructions (before running the code)

The Google News Vectors file and Glove Vectors files need to be in the same directory as the code
All the files mentioned in the "Getting started" section also need to be in the same directory as the code
Use the config file in the skip-thoughts-master directory and modify the base path of the Persona_Classification directory
Add the path of the skip-thoughts-master directory in your PYTHONPATH environment variable

To run the code