benpeloquin7/zipf_principles

make review changes

benpeloquin7 opened this issue · 0 comments

coordinator review
score 6/6

  Type of Submission

    Cognitive Science, Linguistics

  The Review

    This metareview is essentially a summary of the reviews from the three very
    confident experts, as my own expertise for reviewing this paper is limited. All
    three reviewers consider this a very interesting, sound, well-written, and
    original paper that should be presented at CogSci. Reviewers 1 and 3 provide some
    suggestions for (further) improving the presentation.

    Building on work by Piantadosi et al (2011), this paper shows how in-the-moment
    properties of interaction may lead to properties of language systems by simulating
    how languages arrive at systems of ambiguous symbols through the exploitation of
    symbolic information in the context of these symbols and how pragmatic 'agents' in
    turn exploit such systems. A new efficiency objective is proposed in the framework
    of Rational Speech Act Theory, and it is used to evaluate how context influences
    the communicative efficiency of ambiguity. This is done via two numerical
    simulations. The first simulation considers a large sample of hypothetical
    reference games, and shows that the proportion of optimally efficient languages
    that exhibit ambiguity increases with the number of contexts. The second
    simulation shows that rational pragmatic agents use context and ambiguity
    efficiently, for a fixed language. That is, the use of low-cost ambiguous
    utterances increases with the availability of disambiguating context.

    Two of the reviewers point out that the theoretical embedding suggests a much
    wider scope than is actually addressed by this research, i.e. the emergence of
    system-level language properties vs. linguistic ambiguity. Reviewer 1 adds that
    the kind of ambiguity investigated is restricted to references to objects and
    attributes.

----------------------------------------------------------------

reviewer 1 review
score 6/6

  Type of Submission

    Cognitive Science, Linguistics

  The Review

    === originality and significance ===
    This paper shows how in-the-moment properties of interaction may lead to
    properties of language systems by simulating how languages arrive at systems of
    ambiguous symbols through the exploitation of symbolic information in the context
    of these symbols as well as how pragmatic 'agents' in turn exploit such systems. I
    think these are exciting findings that fulfil a promise set out in the Piantadosi
    et al 2011 work that hasn't really been worked out yet (viz. the role of the
    context). While I have theoretical qualms with the approach, the findings are
    significant, and the approach is well-motivated, original, and relevant. I highly
    recommend this paper to be accepted.

    === technical soundness ===
    The various components of the approach are well established and their combination
    didn't raise any issues for me.

    === theoretical merit ===
    I found the results quite interesting (as stated above), but the theoretical
    starting point of this paper and the scope of the implications of the results
    warrant some attention that they are currently not given. I don't think this
    seriously impacts the results of this paper, but I haven't seen much of a
    discussion of these starting points elsewhere either (if there is work addressing
    these concerns, it should be referred to in the paper -- I do realise the limited
    space of a CogSci paper doesn't allow for such elaboration).

    The 'reference game', as the authors describe it, only constitutes a limited part
    of what interlocutors do with language. Causally connecting referential behaviour
    with language structure requires an auxiliary argument that reference is indeed a
    central facet of communicative behaviour (which I think everybody agrees is
    unproblematic) that is furthermore present across the entirety of the lexical
    system (otherwise one cannot claim the 'the structure of the lexicon' is the way
    it is because of this kind of behaviour). That latter point seems more tenuous to
    me. Surely, nouns referring to concrete objects and adjectives referring to
    physically perceivable qualities may be used primarily to do reference work, but
    there are plenty of lexical elements in the language system that *are* ambiguous
    but that *do not* do reference work. These elements are involved in aspects of
    communication like evaluative/epistemic stance marking (verbs, adverbs),  and
    raising conversational implicatures (most other nouns and adjectives). The
    adequate recognition of these communicative intentions seems to be of a different
    nature than the establishment of successful reference to a real-world entity, and
    yet lexical items fulfilling the former set of functions, do display ambiguity as
    well. This means that while the accuracy of the rational-pragmatic speaker account
    may be good, its coverage is limited under its current formulation. This is not a
    flaw, but it is something that needs to be acknowledged if the paper doesn't want
    to come across as overselling its point.

    Ambiguity, furthermore, comes in many different flavours (the use of deictics and
    anaphora, vagueness and underspecification, structural polysemy, irregular
    polysemy, homonymy). These different kinds of ambiguity have different profiles,
    language-structurally (polysemes often have many senses, homonyms are more
    limited, vagueness/underspecification is unlimited), communicatively (certain
    kinds of ambiguity need not be resolved until the resolution is made relevant,
    others do), as well in language processing (homonyms display slower processing,
    polysemes faster; anaphora require different resolution mechanisms than lexical
    ambiguities like homonyms and polysemes). It is not clear what kind of ambiguity
    is targeted here. If the authors intend their approach to apply to any kind of
    ambiguity, I would like to see more discussion of how the various differences
    between kinds of ambiguity do not matter for this approach.

    === breadth of interest ===
    The topic is of central importance to anyone interested in the cognitive science
    of language and by that count, the paper will be interesting to many.
    One issue I had was that the paper relies quite heavily on mathematical
    formulations that are only referred to but not worked out or made intuitive in a
    lot of detail and the presentation of the formalism moves at a fast pace. Knowing
    this line of work, I was able to follow the paper relatively easily, but people
    who are new to this approach might be scared off by the lack of self-
    containedness. I understand space limitations are prohibitive of resolving this,
    but it does make the paper a bit less accessible to a broad CogSci audience than
    is desirable. The addition of an SI is a commendable move and definitely mitigates
    the situation.

    === clarity of writing ===
    The paper was generally well written. See comments above regarding the high speed
    with which concepts are introduced.

    ===detailed comments===
    The different groups in Figs 2 and 3 don't show in black/white printing. Perhaps
    use different shapes?

----------------------------------------------------------------

reviewer 2 review
score 5/6

  Type of Submission

    Cognitive Science, Linguistics

  The Review

    This paper is an investigation of how properties of natural language might be
    determined by functional pressures. The functional pressure in this case
    efficiency, and the property of natural language investigated is a reliance on
    ambiguous terms. The authors define an 'objective' cost/reward function which
    describes the efficiency of a system given the expected effort involved in
    communication for speaker and listener, these being calculated as proportional to
    the surprisal of utterances across contexts.

    In the first simulation, all possible 'languages' are generated which associate at
    least one meaning with each utterance, and at least one utterance with each
    meaning: associations are boolean, and there are four utterances and meanings. A
    set of contexts consisting of probability distributions over meanings are then
    generated using a uniform dirichlet, this set being between 1 and 4 in size. The
    languages are then evaluated for how well they perform according to this set of
    contexts, and the ones which minimise the subjective function are compared with
    the languages which are optimal for the speaker and listener, in order to see what
    proportion of these languages contain ambiguity. Languages which are opimal for
    speakers always contain ambiguity, those which are optimal for hearers never
    comtain ambiguity, and those which optimise the subjectove function contain more
    ambiguity as the context size increases. The authors argue taht this shows how
    languages which are optimsed for efficiency resemble natural language, in that
    they contain amibguity when contextual information is available.

    In the second simulation, a discourse version of the Rational Speech Act is used
    where non-pragmatic speakers are compared with partially and fully pragmatic ones,
    where the latter have access to information gained over a discourse while the
    former do not. Speakers can choose between more or less ambiguous terms from a
    single language which contains both ambiguous and non-ambiguous terms, where
    ambiguous terms are cheaper. Listeners also have to infer which of several
    possible contexts they are in. Results show that non-pragmatic speakers always use
    cheaper, more ambiguous terms, partially pragatic speakers use more expensive,
    less ambiguos terms, while fully pragmatic speakers are initially like partially
    pragmatic speakers but gradualy use more ambiguous, cheaper terms over time.
    Finally, these strategies are evaluated under the subjective function, which shows
    that all of them become more efficient over time, although the pragmatic
    strategies converge more quickly, and the fully pragmatic one seems to be better.

    I really like the way that the authors introduce the paper: the empirical and
    theoretical contextualisation is good. However, the framing is much broader than
    the topic of the paper, i.e. linguistic ambiguity, so I do think that they could
    make this clear much earlier (not in the abstract, which they do, but in the main
    text). The technical details of the model are very clear enough, although I think
    there could be a little more clarity or discussion about the difference between
    simulations 1 and 2, i.e. that s1 does not involve rational agents but is a search
    through logical space, while s2 is an RSA agent model: for some reason thise only
    became clear to me after several readings, so I think you should make this more
    explicit.

    My main criticism regards the strength of the results relative to their framing.
    For s1, the fact that optimal but contextualised languages can or should contain
    ambiguity is not new (as the authors do point out): the main contribution here is
    to show that languages which satisfy the efficiency requirements as defined here
    contain ambiguity, mitigated by the amount of contextual information. So this
    really hinges on how important we think it is to say that speaker/hearer optimised
    languages will contain ambiguity. For s2, the main result seems to be that highly
    pragmatic use results in highly efficient, partially ambiguous languages as
    defined by the function here, but also that less pragmatic strtegies and even
    eventually non-pragmatic ones are ultimately efficient within a discourse. This is
    fine, but I'm not sure how it links in to the big themes at the start of the
    article. So while I think this is all very solid work, I feel as if it might be
    framed a little more carefully with regards to how much the results speak to the
    very large issues addressed at the start of the paper.

----------------------------------------------------------------

reviewer 3 review
score 6/6

  Type of Submission

    Cognitive Science, Linguistics, Psychology

  The Review

    This paper further explores the argument of Piantadosi et al. (2011), which
    suggests that ambiguity is efficient for communication in the presence of context,
    and it does so within the influential RSA framework. A new efficiency objective is
    proposed, and it is used to evaluate how context influences the communicative
    efficiency of ambiguity. This is done via two numerical simulations. The first
    simulation considers a large sample of hypothetical reference games, and shows
    that the proportion of optimally efficient languages that exhibit ambiguity
    increases with the number of contexts. The second simulation shows that rational
    pragmatic agents use context and ambiguity efficiently, for a fixed language. That
    is, the use of low-cost ambiguous utterances increases with the availability of
    disambiguating context.

    This work addresses an important question which is of broad interest to the
    community, especially given the growing interest in the communicative role of
    ambiguity. The paper is clearly written, and the numerical simulations are well-
    designed and provide insightful results. I therefore think that this work would
    make a fine contribution to the CogSci conference.

    The main limitation of this work is that it has not been tested against actual
    experimental data, but only in numerical simulations of toy examples. While I
    appreciate the value of such numerical analysis, the extent to which it accounts
    for the efficiency of actual languages remains unknown.

    In addition, I have a concern about the formulation, which may be important for
    the theoretical justification of this approach. To be specific, it is not clear
    how p(u|c) is defined/justified. On the one hand, footnote 3 says that it is
    assumed that p(u|c) = p(u), and in simulation 1 p(u) is sampled from Dir(1,|U|)
    independently of the speaker's distribution. On the other hand, the notation p(u)
    and the justification for this term in the speaker's effort ("intuitively, the
    number of bits needed to encode the utterance u", p3) suggest that

    p(u) = \sum_{c,m} p(c)p(m|c)S(u|m,c) .

    These two interpretations do not seem compatible with each other. If the first
    interpretation is correct, then it does not correspond to the number of bits that
    the speaker needs to encode u, and thus not justified. If the second
    interpretation is correct, then it is not clear how sampling p(u)~Dir(1,|U|) is
    consistent with the marginal distribution of u written in the equation above.
    Either way, there seem to be an issue here that requires some clarification.

    Besides this, I only have a few minor comments / suggestions:

    1. "In this work, we derive a novel objective function from first principles"
    (p1):
    This statement is too strong, because it was not proven that the proposed
    objective function follows directly from a small set of desired properties (e.g.,
    as in Shannon's derivation of entropy from first principles).

    2. Figure 2, B & C:
    These two examples are not very interesting because it is clear how
    ambiguous/unambiguous languages look like. It would be much more interesting to
    get a better sense of the relation between the parameters of the referential game
    and the optimal language. In other words, this figure could be improved by
    visualizing these two optimal languages together with their corresponding {p(u),
    p(m|c) p(c)}.

    3. The notation is a bit inconsistent. For example, p and P are used
    interchangeably. Strictly speaking, p(M|C) and P(M|C) would refer to different
    distributions, but I don't think this is the intended meaning here (if it is the
    case, then it should be stated more clearly).

    4. A few typos:
    - "according the to [to the ?] following generative model" (p3)
    - "because speaking [it ?] is always low-cost" (p3)
    - "Figure 2, panel (A) plots the proportion of optimal [ambiguous ?] languages
    under each objective as a function of number of contexts" (p4)
    - The left hand side in Eq.2 is missing.

----------------------------------------------------------------