Must

What are the issues?

What should the semantics of "must" be? In particular, should weakness be built into the semantics of "must"?
What (type/strength of) evidence does a speaker need (or do listeners attribute to speakers) for an utterance of "must p" to be licensed? In comparison, what (type/strength of) evidence is required for "p"?
What do speakers commit themselves to truth-conditionally in uttering "must p" ("p"? "might p"?)? Relatedly, what are the entailment relations between "must p", and "p"

How have others dealt with the issues?

vFG (2010)

Semantics: strong "must". "must p" only defined if the kernel K (of propositions representing all of the speaker's direct information) does not direclty settle p, ie does not either entail or contradict it, but p is in the deductive closure E of K. HOW DO YOU MAKE IT INTO THE KERNEL? -> they say `direct evidence or trustworthy reports' -> there seems to be no clear definition of this beyond appeal to intuition. Matthewson extends this: sensory evidence that p; trustworthy reports that p; world knowledge. She then says that all three of these are determined by TRUSTWORTHINESS.
Evidence: "must p": indirect evidence for p. "p": direct evidence for p. Evidence directness is a presupposition (as opposed to a conventional implicature).
Commitment/entailment: "must p" entails p; a speaker who utters "must p" is committed to p; "You can’t have direct information that P unless it is the case that P" (vFG, p. 371). (This is externalist silliness.)

Matthewson (in press)

Thinks it's wrong to call them "epistemic" modals anyway; treat all the epistemic modals as evidentials.

Semantics: AGNOSTIC
Evidence: Rather than requiring the absence of a particular type of evidence (say, perceptual evidence), "must" and "might" rule out evidence of a certain level of trustworthiness or reliability (ie cutting across the branches of Willets's taxonomy of evidence types). IS THIS SIMILAR TO WHAT WE'VE BEEN TALKING ABOUT AS EVIDENCE STRENGTH? -> She even calls it "strength" at some point (p. 8), citing herself (2011b)
Commitment/entailment: AGNOSTIC

Lassiter (to appear)

Semantics: weak "must" (replace vFG's "is entailed by the direct information" with "is the best explanation for the direct information").
Evidence: same as vFG? he doesn't really say much about the evidential component except that "must" encodes an indirect evidential meaning.
Commitment/entailment: splits up the issue into three components: 1. pragmatic strength (speaker committment, blameworthiness), 2. semantic strength (doxastic status of p), 3. semantic strength (veridicity, "actual" status of p). His position: "must p" does not entail p; ie a speaker who utters "must p" is not committed to p, but just thinks p is the best explanation of the facts. AGNOSTIC on the veridicity issue (reasonably so).

Karttunen, 1972; Veltman, 1985; Kratzer, 1991

Semantics: Kratzer, 1991: weak "must". Veltman, 1985: strong bare.
Evidence:
Commitment/entailment:

What are the empirical predictions of L, M, vFG? What are cases that would help weed out theories?

Semantics

Whether necessity is built into the semantics of "must" seems to be a big deal, but I don't see that this is an empirical issue. Whether you say "must" is underlyingly strong but we feel that it's weak because of the evidential component it contributes; or whether you say it's weak, empirically I don't see that this will make a difference. vFG mention the Mastermind and the prime example, where it's clear (?) that the speaker has no uncertainty about p in their use of "must p", as evidence that "must" is sometimes not weak. Even if we accept these examples, just because "must" is sometimes strong doesn't mean that the strength needs to be in the semantics; maybe it's pragmatically derived in those cases. But they then claim that "must" is in fact never weak, citing examples like

#It must be raining, but perhaps it is not raining

which they claim is weird. Dan found some nice examples (eg example (10), p. 7) that have this same general form but go through just fine.

Again, the semantic strength issue seems to me not to be an empirical issue; what we take speakers to be committing themselves to, on the other hand, is. More below.

Speaker commitments / entailments

vFG claim the speaker commits himself to p in uttering "must p", using the "You were wrong! -- # No I wasn't!" blameworthiness test. Dan lists a counterexample (example (12) on p. 8), and makes the observation that the ease of obtaining definitive evidence to resolve whether or not p plays a role in how felicitous the "No I wasn't" response is. Under his account, the speaker is just committed to p being the best explanation of the facts at the time that "must p" was uttered, without committing to maximal certainty about p.

This is interesting. Some predictions: under vFG's account, listeners should always ascribe maximal certainty in p to speakers and consider them blameworthy if p turns out to be false. Under Dan's account, listeners should just believe that the speaker had no better explanation than p for the available evidence. That is, if the evidence is set up right, listeners should be able to ascribe to speakers a very weak belief in p after a "must p" utterance. More generally: vFG predict strong speaker belief in p (or listeners' ascription thereof), whereas Dan's account is compatible with a large degree of contextual variation in speaker belief strength in p (or listeners' ascription thereof).

What is it that should lead to variation under Dan's account? Quality of evidence (and ease of obtaining definitive p-truth resolving evidence, though he doesn't formalize that bit I think). More below.

Evidence

There are multiple dimensions along which to think about evidential restrictions on the use of "must p". vFG say the only restriction is that the evidence for p be indirect, ie that p not be in the kernel, but in the deductive closure of the kernel, and they seem to be saying that this is the sort of evidence that's in the right-most branch of Willett's taxonomy of evidence (but maybe also reportatives that aren't fully trustworthy?). Dan doesn't really talk much about the evidential component but seems to agree with the general approach of treating it as an indirectness presupposition. Matthewson on the other hand focuses on being clearer about the evidential component while remaining relatively agnostic about the rest of the business (speaker commitments, veradicity). For her, rather than being restricted to coming from the rightmost branch, the evidence needs to be below a certain trustworthiness or reliability threshold. That is, the evidence can come from any branch of the evidential tree, so long as it's not entirely trustworthy wrt the truth of p. Types of evidence that send propositions to the kernel:

i. information obtained by sensory observation in the utterance situation ii. trustworthy reports iii. general knowledge

The unifying property behind these is trustworthiness, she says. In a different place (p. 8) she explicitly speaks of evidence trustworthiness as evidence strength.

Predictions I: "must p" is only ok to use if the speaker has

vFG: indirect evidence (or untrustworthy reports?), but not other types of evidence
Matthewson: evidence of any type that's below some trustworthiness/strength threshold
Dan: indirect?

It seems that none of the accounts predict a gradient effect of evidence strength on the naturalness of or expectation for "must p". Even Matthewson seems to predict a step function: "must p" not ok above thrheshold, ok below threshold.

Predictions II: What evidence will listeners ascribe to the speaker who utters "must p"?

vFG: only indirect evidence
Matthewson: only evidence below a certain trustworthiness threshold
Dan: indirect? evidence e that makes p the best explanation of e

Problems:

How can we independently establish directness of a piece of evidence e for a proposition p in terms of entailment relations between e and p? Does this even work for the general case?

Which issues do our experiments thus far contribute to?

Evidence

Speakers are more likely to use "must p" (compared to bare prejacent) as the evidence strength for p decreases, and they're more likely to use "must p" with indirect evidence than with direct evidence (barring some possible statistical trustworthiness issues -- ha). So, both vFG and Matthewson are a little right; but both of them are a little wrong, too: neither one makes the probabilistic prediction. Does Dan? That's still unclear to me.

Listeners rate the strength of the evidence available to the speaker as lower when the speaker says "must p" than when he says "p". Again, this is not a step function. Not enough power here to address the evidence type issue in addition to evidence strength.

Commitment

Our experiments estimate both (listeners' estimates of) speaker commitment as well as the resulting listener belief in p. In one experiment we ask them to rate how likely they think it is that p, in another, how likely they think it is that the speaker believes p. Overall listeners believe that p was less likely after observing "must p" than after observing the bare form, but for both the bare and the "must p" form they think the speaker believes p more strongly than they (the listeners) end up doing. And this is between subjects! There's no difference between resulting speaker and listener belief in p for "might p" and "probably p".

This suggests that listeners don't consider speakers to be fully committed to p when they produce "must p", in support of Dan and against vFG.

What are the modeling approaches we've been pursuing?

possible worlds semantics with [[must p]] = "necessarily in all worlds p" and [[p]] = "in the actual world p"; QUD inference with three different QUDS (Is it raining in the actual world? Is there direct evidence? Is there indirect evidence?). The idea was that the incredibly low probability of making "must p" true would lead to a hyperbolic interpretation of "must p" (that's where it all started!!). Also, we assume that the more number of worlds there are where p is true, the more likely the speaker is to have indirect evidence of p. But that is kind of a weird assumption to have, and in any case this model did not actuallly work. Also, Noah hates this (I think the issue was the unintuitive possible worlds semantics (as well as the weird assumptions about relationship between possible worlds and indirect evidence)).
An M-implicature model where we posit a specific belief prior that is peaky at the high and low ends. This model assumes that "p" and "must p" both have a threshold semantics: "p" is true if P(p) > theta_bare, and "must p" is true if P(p) > theta_must. It also assumes that "must p" is much costlier to utter than "p", and that the speaker has the option of saying nothing (cost = 0) (although it turns out that the null utterance is not always necessary depending on the shape of the prior!). This model produces the desired effects, where the speaker believes that p has a higher probability of being true given "p" than given "must p". However, it is unclear whether we have a good justification for the prior, or whether it makes sense for "p" to have a threshold semantics. This model also only produces the listener's inferences about the speaker's beliefs, and says nothing about the inferred evidence or the inferred state of the world (listener's posterior beliefs about the world).
An M-implicature model where we posit a real-world state (e.g. rain) that generates different distributions of evidence types/strengths (e.g. seeing rain, seeing umbrellas, weather report). These different evidence types in turn generate different distributions over speaker's beliefs about probabilities of p. Like the previous model, this model assumes that "p" and "must p" both have a threshold semantics: "p" is true if P(p) > theta_bare, and "must p" is true if P(p) > theta_must. It also assumes that "must p" is much costlier to utter than"p". The model produces the desired effects, where the speaker believes that p has a higher probability of being true given she said "p" than given "must p". In addition, the model infers the probability that the speaker has certain kinds of evidence, as well as the probability of p being true in the world. This seems useful, and addresses the distinction between speaker commitment vs listener's interpretations (a difference that we find in the newest experiment comparing speaker beliefs vs listener beliefs). This model still has the issue of assuming a threshold semantics for "p", although we could perhaps argue (if we want to) that this is sort of like a slack threshold ("p" is "true" if P(p) is greater than some threshold, but doesn't have to be perfectly true).
An implicature model that is basically identical in structure to the M-implicature model described above, but does not posit a threshold semantics for "p"--"p" is true iff P(p) = 1. This means that given the utterance "p", the listener infers that the speaker believes p with certainty. However, the listener may still be uncertain about whether p is true in the world. This gets rid of the worry about threshold semantics for "p," but introduces relative weakness in the semantics of "must p".
An M-implicature + QUD model, where the implicature is driven by a "marked" QUD, namely an unlikely QUD about evidence type, as well as the more "marked" (costly) utterance "must p". This model has been implemented but does not work.

How do the modeling approaches we've been pursuing contribute to the issues?

So far, it seems like we can make the following minor contributions...

We show that it is in principle possible for the weakness of "must" to arise from M-implicature, without baking in different literal semantics for "p" and "must p"
Our model is able to make predictions at once about speaker beliefs, speaker evidence, and listener beliefs in a fairly coherent manner

What should we do next? What could our contribution be?

It seems to me that there are two things that we could contribute:

Clarifying the evidential component that "must" contributes (type vs strength, and how are those cashed out?)
Building a probabilistic model that
- gets listeners' resulting degree of belief in p (when utterance is "p" vs "must p")
- gets listeners' ascription of degree of belief in p to speaker who utters "p" vs "must p"
- gets listeners' ascription of evidence strength/type for p to speaker who utters "p" vs "must p"
- get speakers' probability of producing "p" vs "must p", given evidence strength/type

Experiments

tease apart issue of evidence type and strength on
1. production (choice between "p" and "must p")
2. comprehension (listeners' ascription of evidence to speakers) -- independent effects? does one subsume the other?

Models

Figure out why M implicature + QUD model isn't working
Decide whether it makes sense to have a threshold semantics for "p"
Decide whether it makes sense for listeners first to infer weakness and then from weakness infer indirect evidence (experiments to support this?)
Implement a model where listeners first infer indirect evidence and then infer weakness
Implement a model where indirect evidence is baked into the literal semantics of "must" (vFG, Matthewson)
In general, clarify relationship between evidence and speaker belief

Questions

If "must" is an evidential, why isn't it with other flavors of modality?
What's the directionality between evidence and belief strength when observing "must p"?
- vFG: Listener observes "must p" --> gets weak evidence for p for free --> infer less than maximal probability of p
- Dan(?): Listener observes "must p" --> gets weak evidence for p for free --> infers weak speaker belief in p.
- Listener observes "must p" --> gets weak evidence for p for free --> infers weak speaker belief in p and less than maximal prbability of p.
- Listener observes "must p" --> gets weak belief for p for free --> infers weak speaker evidence for p.

Quotes and other useful bits

Matthewson (in press, p. 1) on the issue of all epistemic modals being evidentials:

all epistemic modals encode evidential information, as a matter of definition, since an ‘epistemic modal’ is a modal whose modal base relies on evidence (not on knowledge)

Westmoreland (1995, p. 699) as cited in vFG (2010) on the indirectness contribution of "must":

epistemic must ‘‘contributes the information that the propositional content of the sentence is inferred rather than known’’

vFG (2010), making a strong (ie stupid) ontological commitment:

you can’t have direct information that P unless it is the case that P. So for a modal uttered at w, with respect to a kernel K, we know that w ∈ ∩K.

Matthewson (in press, p. 9) on the distinction between evidence trustworthiness/strength and speaker certainty:

It is important that the trustworthiness distinction is still an evidential notion, and does not reduce to speaker certainty about the prejacent proposition.

Willett (1988, p. 57)'s taxonomy of evidence types:

direct

attested -- visual / auditory / other sensory

indirect

reported -- second-hand / third-hand / folklore

inferring -- results / reasoning

Relation to attitude verbs

How does the view of epistemic "must" as an evidential marker relate to the rich literature on factivity/veridicity in attitude verbs and whether the content of the mental state or the state of the reported conversation gets foregrounded/backgrounded?

Extra

tgrep2_search/results/bncs.tab contains the results (19437 cases) of a tgrep2 search on the spoken BNC for necessity modals as specified in the @MODAL macro in tgrep2_search/MACROS.ptn

Results database currently contains unique ID, two lines of previous context, the sentence containing the modal, a column coding whether the modal is must/have to/have got to, a column coding whether (for the non-must cases) the form is third-person-sg/past tense/other, and the POS of the modal as annotated in the corpus (this was the beginning of an attempt to extract the verbs following the modals, to see whether there are any interesting differences).

thegricean/modals

Must

What are the issues?

How have others dealt with the issues?

vFG (2010)

Matthewson (in press)

Lassiter (to appear)

Karttunen, 1972; Veltman, 1985; Kratzer, 1991

What are the empirical predictions of L, M, vFG? What are cases that would help weed out theories?

Semantics

Speaker commitments / entailments

Evidence

Predictions I: "must p" is only ok to use if the speaker has

Predictions II: What evidence will listeners ascribe to the speaker who utters "must p"?

Problems:

Which issues do our experiments thus far contribute to?

Evidence

Commitment

What are the modeling approaches we've been pursuing?

How do the modeling approaches we've been pursuing contribute to the issues?

What should we do next? What could our contribution be?

Experiments

Models

Questions

Quotes and other useful bits

Relation to attitude verbs

Extra