- Python 3.6
- JRE 1.8.0_131-b11
- FudanNLP 2.1
- NLTK 3.2.5
- jieba 0.39
- matplotlib 2.1.0
- numpy 1.13.3
- sklearn
- Enter the
/src/
folder. - Put the
.txt
files, each of which contains all the text from one author, under the./input/
folder. - If you are working on Windows, run
run.bat
in the cmd or just double click it. This script extracts author's writing features from their texts and generates several files in./output/
folder.python run.py <a_author.txt>
will load the processing script only for this author. Files generated are
cleaned_<author>.txt
the text after preprocessingcleaned_<author><number>.txt
split pieces of the preprocessed fileall_feat_<author>.txt
features extracted from the whole cleaned text of the authorfeat_<author><number>.txt
features extracted from the number-th piece of cleaned filedepen_<author><number>.txt
sentence dependency featuresdepen_text_<author>.txt
the encoded text file
- After that, run
plot.bat
will launch the script for plotting related infomation in the./output/
folder.
If you are working on other OS, sorry, no batch processing is available yet. The set up procedure was only tested on Windows 8.1. There is no guarentee for cross-platform.
- Put the text file to be tested in
./test/
folder. - run the batch file
test.bat
.