Implementation of PCFG-BCL by Kewei Tu and Vasant Honavar [1].
PCFG-BCL is an unsupervised algorithm that learns a probabilistic context-free grammar (PCFG) from positive samples. The algorithm acquires rules of an unknown PCFG through iterative biclustering of bigrams in the training corpus.
tugram.py learning_corpus generated_grammar
- tugram.py - Main script. Learns a PCFG (output) from a learning corpus (input).
- pcfg_bcl.py - PCFG-BCL implementation.
- grammars.py - Functions used to generate test corpora from PCFGs.
- test.py - Tests from section 5 in the paper [1].
- *.txt - Test corpora.
Corpus\Score | Precision | Recall | F-score |
---|---|---|---|
Baseline | 90.0 | 100 | 93.3 |
Num-agr | 45.5 | 100 | 61.8 |
Langley1 | 88.0 | 100 | 89.4 |
Langley2 | 100 | 100 | 100 |
- Python 2.7+
- nltk
- numpy
- pandas
- coclust
[1] Tu, K., & Honavar, V. (2008, September). Unsupervised learning of probabilistic context-free grammar using iterative biclustering. In ICGI (pp. 224-237). pdf