/DLM-ICM-baselines

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

The above code generates random baseline trees that match the natural language trees in the number of nodes, number of crossings, and distribution of dependency lengths and intervener complexity. See this paper for more details.

Each module with the name "consruct_output... .py" takes a directory containing conllu format files, and computes the formal measures for real trees and corresponding random baseline trees. There are six such modules for six baselines. The measures being computed i.e., gap degree, edge degree etc. are stored in same output file for real trees and random baseline trees.

The modules with name "baseline_conditions... .py" containts the algorithm to generate trees for different baselines. To know more about these baselines and measures being computed, see this paper.

To install required libraries, run: pip install -r requirements.txt