Code and paper for Cogsci 2019.
Languages display a diverse set of distributional regularities such as the relation between a word's frequency and rank in a corpus, the distribution of dependency lengths, or the presence of lexical properties such as ambiguity. We discuss a framework for studying how these properties emerge from in-the-moment interactions of rational, pragmatic speakers and listeners. Our work takes Zipfian notions of lexicon-level efficiency as a starting point, connecting these ideas to Gricean notions of conversational-level efficiency. To do so, we derive an objective function for measuring the communicative efficiency of linguistic systems and then examining the behavior of this objective in a series of simulations focusing on the communicative function of ambiguity in language. These simulations suggest that rational pragmatic agents will produce communicatively efficient systems and that interactions between such agents provide a framework for examining efficient properties of language more broadly.
To generate the simulated data for this section run from repo root.
>>> python -m ambiguity.run --sim-type context --out-dir your_output_dir
or the simulation data from the paper is available upon request (bpeloqui@stanford.edu)
To generate the plots run code in ambiguity/simulation-runners/context-ambiguity-plots.Rmd
pointing to local file-paths.
To generate the simulated data run code chunks in ambiguity/simulation-runners/discourse-ambiguity.Rmd
or the simulation data from the paper is available upon request.
To generate the plots run code in ambiguity/simulation-runners/discourse-ambiguity-plots.Rmd
pointing to local file-paths.
We provide a complete derivation of the Zipfian-inspired objective for language efficiency as well as additional modeling details in the supplemental materials paper/supplementary_materials.pdf
.
zipf_principles/ambiguity/...
run.py
primary run script for Simulation 1.config.py
contains configurations used for Simulation 1.objectives.py
contains various objectives including our derived speaker-listener cross-entropy as well as several comparisons not included in the paper.simulations.py
primary simulation infrastructure. Note that we run "context" simulations in the current work.agents.py
basic matrix manipulation as well as matrix-defined RSA agents.utils.py
general usefulities.simulation-runners
contains simulation run and plotting functionality (detailed above).zipf_principles/ambiguity/webppl/...
discourse_ambiguity_model.wppl
contains Simulation 2 model code. For best viewing set interpreter to javascript.