make review changes
benpeloquin7 opened this issue · 0 comments
benpeloquin7 commented
coordinator review
score 6/6
Type of Submission
Cognitive Science, Linguistics
The Review
This metareview is essentially a summary of the reviews from the three very
confident experts, as my own expertise for reviewing this paper is limited. All
three reviewers consider this a very interesting, sound, well-written, and
original paper that should be presented at CogSci. Reviewers 1 and 3 provide some
suggestions for (further) improving the presentation.
Building on work by Piantadosi et al (2011), this paper shows how in-the-moment
properties of interaction may lead to properties of language systems by simulating
how languages arrive at systems of ambiguous symbols through the exploitation of
symbolic information in the context of these symbols and how pragmatic 'agents' in
turn exploit such systems. A new efficiency objective is proposed in the framework
of Rational Speech Act Theory, and it is used to evaluate how context influences
the communicative efficiency of ambiguity. This is done via two numerical
simulations. The first simulation considers a large sample of hypothetical
reference games, and shows that the proportion of optimally efficient languages
that exhibit ambiguity increases with the number of contexts. The second
simulation shows that rational pragmatic agents use context and ambiguity
efficiently, for a fixed language. That is, the use of low-cost ambiguous
utterances increases with the availability of disambiguating context.
Two of the reviewers point out that the theoretical embedding suggests a much
wider scope than is actually addressed by this research, i.e. the emergence of
system-level language properties vs. linguistic ambiguity. Reviewer 1 adds that
the kind of ambiguity investigated is restricted to references to objects and
attributes.
----------------------------------------------------------------
reviewer 1 review
score 6/6
Type of Submission
Cognitive Science, Linguistics
The Review
=== originality and significance ===
This paper shows how in-the-moment properties of interaction may lead to
properties of language systems by simulating how languages arrive at systems of
ambiguous symbols through the exploitation of symbolic information in the context
of these symbols as well as how pragmatic 'agents' in turn exploit such systems. I
think these are exciting findings that fulfil a promise set out in the Piantadosi
et al 2011 work that hasn't really been worked out yet (viz. the role of the
context). While I have theoretical qualms with the approach, the findings are
significant, and the approach is well-motivated, original, and relevant. I highly
recommend this paper to be accepted.
=== technical soundness ===
The various components of the approach are well established and their combination
didn't raise any issues for me.
=== theoretical merit ===
I found the results quite interesting (as stated above), but the theoretical
starting point of this paper and the scope of the implications of the results
warrant some attention that they are currently not given. I don't think this
seriously impacts the results of this paper, but I haven't seen much of a
discussion of these starting points elsewhere either (if there is work addressing
these concerns, it should be referred to in the paper -- I do realise the limited
space of a CogSci paper doesn't allow for such elaboration).
The 'reference game', as the authors describe it, only constitutes a limited part
of what interlocutors do with language. Causally connecting referential behaviour
with language structure requires an auxiliary argument that reference is indeed a
central facet of communicative behaviour (which I think everybody agrees is
unproblematic) that is furthermore present across the entirety of the lexical
system (otherwise one cannot claim the 'the structure of the lexicon' is the way
it is because of this kind of behaviour). That latter point seems more tenuous to
me. Surely, nouns referring to concrete objects and adjectives referring to
physically perceivable qualities may be used primarily to do reference work, but
there are plenty of lexical elements in the language system that *are* ambiguous
but that *do not* do reference work. These elements are involved in aspects of
communication like evaluative/epistemic stance marking (verbs, adverbs), and
raising conversational implicatures (most other nouns and adjectives). The
adequate recognition of these communicative intentions seems to be of a different
nature than the establishment of successful reference to a real-world entity, and
yet lexical items fulfilling the former set of functions, do display ambiguity as
well. This means that while the accuracy of the rational-pragmatic speaker account
may be good, its coverage is limited under its current formulation. This is not a
flaw, but it is something that needs to be acknowledged if the paper doesn't want
to come across as overselling its point.
Ambiguity, furthermore, comes in many different flavours (the use of deictics and
anaphora, vagueness and underspecification, structural polysemy, irregular
polysemy, homonymy). These different kinds of ambiguity have different profiles,
language-structurally (polysemes often have many senses, homonyms are more
limited, vagueness/underspecification is unlimited), communicatively (certain
kinds of ambiguity need not be resolved until the resolution is made relevant,
others do), as well in language processing (homonyms display slower processing,
polysemes faster; anaphora require different resolution mechanisms than lexical
ambiguities like homonyms and polysemes). It is not clear what kind of ambiguity
is targeted here. If the authors intend their approach to apply to any kind of
ambiguity, I would like to see more discussion of how the various differences
between kinds of ambiguity do not matter for this approach.
=== breadth of interest ===
The topic is of central importance to anyone interested in the cognitive science
of language and by that count, the paper will be interesting to many.
One issue I had was that the paper relies quite heavily on mathematical
formulations that are only referred to but not worked out or made intuitive in a
lot of detail and the presentation of the formalism moves at a fast pace. Knowing
this line of work, I was able to follow the paper relatively easily, but people
who are new to this approach might be scared off by the lack of self-
containedness. I understand space limitations are prohibitive of resolving this,
but it does make the paper a bit less accessible to a broad CogSci audience than
is desirable. The addition of an SI is a commendable move and definitely mitigates
the situation.
=== clarity of writing ===
The paper was generally well written. See comments above regarding the high speed
with which concepts are introduced.
===detailed comments===
The different groups in Figs 2 and 3 don't show in black/white printing. Perhaps
use different shapes?
----------------------------------------------------------------
reviewer 2 review
score 5/6
Type of Submission
Cognitive Science, Linguistics
The Review
This paper is an investigation of how properties of natural language might be
determined by functional pressures. The functional pressure in this case
efficiency, and the property of natural language investigated is a reliance on
ambiguous terms. The authors define an 'objective' cost/reward function which
describes the efficiency of a system given the expected effort involved in
communication for speaker and listener, these being calculated as proportional to
the surprisal of utterances across contexts.
In the first simulation, all possible 'languages' are generated which associate at
least one meaning with each utterance, and at least one utterance with each
meaning: associations are boolean, and there are four utterances and meanings. A
set of contexts consisting of probability distributions over meanings are then
generated using a uniform dirichlet, this set being between 1 and 4 in size. The
languages are then evaluated for how well they perform according to this set of
contexts, and the ones which minimise the subjective function are compared with
the languages which are optimal for the speaker and listener, in order to see what
proportion of these languages contain ambiguity. Languages which are opimal for
speakers always contain ambiguity, those which are optimal for hearers never
comtain ambiguity, and those which optimise the subjectove function contain more
ambiguity as the context size increases. The authors argue taht this shows how
languages which are optimsed for efficiency resemble natural language, in that
they contain amibguity when contextual information is available.
In the second simulation, a discourse version of the Rational Speech Act is used
where non-pragmatic speakers are compared with partially and fully pragmatic ones,
where the latter have access to information gained over a discourse while the
former do not. Speakers can choose between more or less ambiguous terms from a
single language which contains both ambiguous and non-ambiguous terms, where
ambiguous terms are cheaper. Listeners also have to infer which of several
possible contexts they are in. Results show that non-pragmatic speakers always use
cheaper, more ambiguous terms, partially pragatic speakers use more expensive,
less ambiguos terms, while fully pragmatic speakers are initially like partially
pragmatic speakers but gradualy use more ambiguous, cheaper terms over time.
Finally, these strategies are evaluated under the subjective function, which shows
that all of them become more efficient over time, although the pragmatic
strategies converge more quickly, and the fully pragmatic one seems to be better.
I really like the way that the authors introduce the paper: the empirical and
theoretical contextualisation is good. However, the framing is much broader than
the topic of the paper, i.e. linguistic ambiguity, so I do think that they could
make this clear much earlier (not in the abstract, which they do, but in the main
text). The technical details of the model are very clear enough, although I think
there could be a little more clarity or discussion about the difference between
simulations 1 and 2, i.e. that s1 does not involve rational agents but is a search
through logical space, while s2 is an RSA agent model: for some reason thise only
became clear to me after several readings, so I think you should make this more
explicit.
My main criticism regards the strength of the results relative to their framing.
For s1, the fact that optimal but contextualised languages can or should contain
ambiguity is not new (as the authors do point out): the main contribution here is
to show that languages which satisfy the efficiency requirements as defined here
contain ambiguity, mitigated by the amount of contextual information. So this
really hinges on how important we think it is to say that speaker/hearer optimised
languages will contain ambiguity. For s2, the main result seems to be that highly
pragmatic use results in highly efficient, partially ambiguous languages as
defined by the function here, but also that less pragmatic strtegies and even
eventually non-pragmatic ones are ultimately efficient within a discourse. This is
fine, but I'm not sure how it links in to the big themes at the start of the
article. So while I think this is all very solid work, I feel as if it might be
framed a little more carefully with regards to how much the results speak to the
very large issues addressed at the start of the paper.
----------------------------------------------------------------
reviewer 3 review
score 6/6
Type of Submission
Cognitive Science, Linguistics, Psychology
The Review
This paper further explores the argument of Piantadosi et al. (2011), which
suggests that ambiguity is efficient for communication in the presence of context,
and it does so within the influential RSA framework. A new efficiency objective is
proposed, and it is used to evaluate how context influences the communicative
efficiency of ambiguity. This is done via two numerical simulations. The first
simulation considers a large sample of hypothetical reference games, and shows
that the proportion of optimally efficient languages that exhibit ambiguity
increases with the number of contexts. The second simulation shows that rational
pragmatic agents use context and ambiguity efficiently, for a fixed language. That
is, the use of low-cost ambiguous utterances increases with the availability of
disambiguating context.
This work addresses an important question which is of broad interest to the
community, especially given the growing interest in the communicative role of
ambiguity. The paper is clearly written, and the numerical simulations are well-
designed and provide insightful results. I therefore think that this work would
make a fine contribution to the CogSci conference.
The main limitation of this work is that it has not been tested against actual
experimental data, but only in numerical simulations of toy examples. While I
appreciate the value of such numerical analysis, the extent to which it accounts
for the efficiency of actual languages remains unknown.
In addition, I have a concern about the formulation, which may be important for
the theoretical justification of this approach. To be specific, it is not clear
how p(u|c) is defined/justified. On the one hand, footnote 3 says that it is
assumed that p(u|c) = p(u), and in simulation 1 p(u) is sampled from Dir(1,|U|)
independently of the speaker's distribution. On the other hand, the notation p(u)
and the justification for this term in the speaker's effort ("intuitively, the
number of bits needed to encode the utterance u", p3) suggest that
p(u) = \sum_{c,m} p(c)p(m|c)S(u|m,c) .
These two interpretations do not seem compatible with each other. If the first
interpretation is correct, then it does not correspond to the number of bits that
the speaker needs to encode u, and thus not justified. If the second
interpretation is correct, then it is not clear how sampling p(u)~Dir(1,|U|) is
consistent with the marginal distribution of u written in the equation above.
Either way, there seem to be an issue here that requires some clarification.
Besides this, I only have a few minor comments / suggestions:
1. "In this work, we derive a novel objective function from first principles"
(p1):
This statement is too strong, because it was not proven that the proposed
objective function follows directly from a small set of desired properties (e.g.,
as in Shannon's derivation of entropy from first principles).
2. Figure 2, B & C:
These two examples are not very interesting because it is clear how
ambiguous/unambiguous languages look like. It would be much more interesting to
get a better sense of the relation between the parameters of the referential game
and the optimal language. In other words, this figure could be improved by
visualizing these two optimal languages together with their corresponding {p(u),
p(m|c) p(c)}.
3. The notation is a bit inconsistent. For example, p and P are used
interchangeably. Strictly speaking, p(M|C) and P(M|C) would refer to different
distributions, but I don't think this is the intended meaning here (if it is the
case, then it should be stated more clearly).
4. A few typos:
- "according the to [to the ?] following generative model" (p3)
- "because speaking [it ?] is always low-cost" (p3)
- "Figure 2, panel (A) plots the proportion of optimal [ambiguous ?] languages
under each objective as a function of number of contexts" (p4)
- The left hand side in Eq.2 is missing.
----------------------------------------------------------------