CMSC773-Schizophrenia-Detection

Han-Chin Shing
Bor-Chun Chen


###parse_data_for_LIWC.py

Convert tweets into txt files. One file per person, one line per tweet. Used for LIWC2015 to analyse the content.

###LIWC2015_Results.csv

Result of running LIWC2015 on the txt files generated by parse_data_for_LIWC.

###liwc_features.py

Load results of LIWC2015_Results.csv into feature dictionary.

###dic2trie.py

Convert .dic to .trie for faster pattern matching

###liwc_entropy.py

get a dictionary of Counter objects, with the key being the categories. For each category, there is a Counter asscociated that keep track of what word is being encountered.

Usage:

liwc = LiwcEntropy()
document = ' '.join(open('../data/txt/liwc.7_kg7BsyyTy8.txt').readlines())
categories = liwc.count_tokens_in_categories(document)
print json.dumps(categories, sort_keys=True, indent=4, separators=(',', ': '))

###heatmap_liwc_entropy.py

draw a heatmap, with the i, j cordinate being the Janson-Shannon divergence between user i and user j.

Evaluations with SVM and VW using Feature file

Usage: sh run_svm.sh feature_file output-prefix It will generate ROC curve using SVM and VW as classifier and compute the 0/1 loss