
Han-Chin Shing
Bor-Chun Chen

Convert tweets into txt files. One file per person, one line per tweet. Used for LIWC2015 to analyse the content.


Result of running LIWC2015 on the txt files generated by parse_data_for_LIWC.

Load results of LIWC2015_Results.csv into feature dictionary.

Convert .dic to .trie for faster pattern matching

get a dictionary of Counter objects, with the key being the categories. For each category, there is a Counter asscociated that keep track of what word is being encountered.


liwc = LiwcEntropy()
document = ' '.join(open('../data/txt/liwc.7_kg7BsyyTy8.txt').readlines())
categories = liwc.count_tokens_in_categories(document)
print json.dumps(categories, sort_keys=True, indent=4, separators=(',', ': '))

draw a heatmap, with the i, j cordinate being the Janson-Shannon divergence between user i and user j.

Evaluations with SVM and VW using Feature file

Usage: sh feature_file output-prefix It will generate ROC curve using SVM and VW as classifier and compute the 0/1 loss