taldatech/python-nlp-dependency-parser

NLP Dependency Parsing using Perceptron and Chu-Liu-Edmonds

Jupyter Notebook

python-nlp-dependency-parser

NLP Dependency Parsing using Perceptron and Chu-Liu-Edmonds. The task is building an unlabeled parse tree for a POS-tagged sentence.

Prerequisites

Library	Version
`Python`	`3.5.5 (Anaconda)`
`numpy`	`1.14.5`
`matplotlib`	`3.0.0`

Files in the repository

File name	Purpsoe
`DepOptimizer.py`	Helper class for the Chu-Liu-Edmonds algorithmm
`DependencyParser.py`	Parser class, including Structured Perceptron
`ProgressBar.py`	Progress Bar class
`generate_competition_files.py`	exmaple for generating trees for unlabled files
`chu_liu.py`	Chu-Liu-Edmonds algorithmm
`features.py`	old features functions, more modular
`features_v2.py`	modified features, including McDonald features, less modular
`dependency_parser.py`	example for training a model
`calc_accuracy_main.py`	example for evaluating a model
`utils.py`	other utility functions
`FeaturesAnalysis.ipynb`	features analysis
`ModelsComparison.ipynb`	comparison of various models
`DataAnalysis.ipynb`	first analysis of the problem
`*.wtag`	labeled samples
`*.unlabeld`	samples without labels
`*.weights`	Checkpoint files for the models' weights (inferring/continual learning)

Generating Labeled Files Example

Prepare 2 pretrained models from /pretrained
Evaluate them on the test file
Generate labels for the unlabeled file
Validate the generated file (accuracy should be 100%).

python generate_competition_files.py

Training and Testing Example

change models parameters (features, feature threshold, number of training iterations) in dependency_parser.py
run python dependency_parser.py