Han-Chin Shing
Bor-Chun Chen
###parse_data_for_LIWC.py
Convert tweets into txt files. One file per person, one line per tweet. Used for LIWC2015 to analyse the content.
###LIWC2015_Results.csv
Result of running LIWC2015 on the txt files generated by parse_data_for_LIWC.
###liwc_features.py
Load results of LIWC2015_Results.csv into feature dictionary.
###dic2trie.py
Convert .dic to .trie for faster pattern matching
###liwc_entropy.py
get a dictionary of Counter objects, with the key being the categories. For each category, there is a Counter asscociated that keep track of what word is being encountered.
Usage:
liwc = LiwcEntropy()
document = ' '.join(open('../data/txt/liwc.7_kg7BsyyTy8.txt').readlines())
categories = liwc.count_tokens_in_categories(document)
print json.dumps(categories, sort_keys=True, indent=4, separators=(',', ': '))
###heatmap_liwc_entropy.py
draw a heatmap, with the i, j cordinate being the Janson-Shannon divergence between user i and user j.
Usage: sh run_svm.sh feature_file output-prefix It will generate ROC curve using SVM and VW as classifier and compute the 0/1 loss