A typical text consists of sentences that are glued together in a systematic way to form a coherent discourse. Shallow discourse parsing is the task of parsing a piece of text into a set of discourse relations between two adjacent or non-adjacent discourse units. We call this task shallow discourse parsing because the relations in a text are not connected to one another to form a connected structure in the form of a tree or graph.
More can be found in CoNLL site.
In this project, I extracted features word pair
, production rules
and dependency rules
. And I added first last pairs
to increase accuracy. After extract the features, I used Mutual Information to decrease dimentions and trained the model with maxent
classifier in ntlk
. Final results reached accuracy of 40.7 on the test data set. More results can be found in the Project report.(in Chinese)
data/
: some files for training
lib/
: some open tool libraries for feature extraction
model/
: some models saved
test/
: directory for save files generated in testing
cleandata.py
: some functions for data cleaning
config.py
: constants in programs
mytest.py
: test program
mytrain.py
:train program
preprocess.py
: some functions for generating train data
scorer.py
: standard scorer program
predict.json
: the default test output
java -version >= 1.8.0 ntlk 3.0.0 sklearn
For train: python mytrain.py The default output is 'train.model'
For test: usage: mytest.py [-h] [rule] file
test model with options rule all: generate dependency rules and production rules drule: generate dependency rules prule: generate production rules none: use generated rules file test data file required
example:
python mytest.py all test_pdtb.nosense.json
The default output is 'predict.json'
If you have any problem running the program, contact ahshenbingyu@163.com
[1] Lin, Z., Kan, M. Y., & Ng, H. T. (2009). Recognizing implicit discourse relations in the Penn Discourse Treebank.
[2] Chen, D., & Manning, C. D. (2014). A Fast and Accurate Dependency Parser using Neural Networks.
[3] Ji, Y., & Eisenstein, J. (2014). One vector is not enough: Entity-augmented distributional semantics for discourse relations.
[4] Ziheng Lin, Hwee Tou Ng, and Min-Yen Kan. (2010). A PDTB-Styled End-to-End Discourse Parser
[5] Pitler, E., Louis, A., & Nenkova, A. (2009). Automatic sense prediction for implicit discourse relations in text