NLP Dependency Parsing using Perceptron and Chu-Liu-Edmonds. The task is building an unlabeled parse tree for a POS-tagged sentence.
Library | Version |
---|---|
Python |
3.5.5 (Anaconda) |
numpy |
1.14.5 |
matplotlib |
3.0.0 |
File name | Purpsoe |
---|---|
DepOptimizer.py |
Helper class for the Chu-Liu-Edmonds algorithmm |
DependencyParser.py |
Parser class, including Structured Perceptron |
ProgressBar.py |
Progress Bar class |
generate_competition_files.py |
exmaple for generating trees for unlabled files |
chu_liu.py |
Chu-Liu-Edmonds algorithmm |
features.py |
old features functions, more modular |
features_v2.py |
modified features, including McDonald features, less modular |
dependency_parser.py |
example for training a model |
calc_accuracy_main.py |
example for evaluating a model |
utils.py |
other utility functions |
FeaturesAnalysis.ipynb |
features analysis |
ModelsComparison.ipynb |
comparison of various models |
DataAnalysis.ipynb |
first analysis of the problem |
*.wtag |
labeled samples |
*.unlabeld |
samples without labels |
*.weights |
Checkpoint files for the models' weights (inferring/continual learning) |
- Prepare 2 pretrained models from
/pretrained
- Evaluate them on the test file
- Generate labels for the unlabeled file
- Validate the generated file (accuracy should be 100%).
python generate_competition_files.py
- change models parameters (features, feature threshold, number of training iterations) in
dependency_parser.py
- run
python dependency_parser.py