W4: Williamson II

Does knowledge, and only knowledge, constitute evidence?

1. Introduction

Here, I argue that all and only knowledge constitutes evidence (‘E=K’). This is a fairly controversial claim that Williamson defends in Knowledge and Its Limits (2000) and, more tightly, in ‘E=K, but what about R?’ (2024a). In the latter, Williamson lays out three functional roles of evidence:

The job description for evidence includes three general tasks. First, evidence rules out some hypotheses by being inconsistent with them. Second, in inference to the best explanation (closely related to abduction), evidence is what the best hypothesis best explains. Third, in probabilistic confirmation, evidence is what the probability of the hypothesis is conditional on.

But by plumbing a bit into the mechanics of conditionalization, I show how the third criterion naturally generates the first two, and gives us a very clean argument that E=K.

In brief: you should never outright discard a possibility from consideration when, given what you know, it might actually obtain; conversely, you may always discard a possibility from consideration when what you know guarantees that it can’t obtain. Of course, there are practical limitations to these two claims; but, strikingly, those practical limitations apply in precisely the same way to evidence. One conjecture is that individual reasoning is something like an internalization of collective reasoning: we tend to think about the premises of sound collective reasoning under the guise of ‘evidence’, and the premises of sound individual reasoning under the guise of ‘knowledge’.

I assume that we start with a good working grasp on knowledge and evidence. In particular, we know roughly what knowledge requires, and where knowledge is required. Further, we can roughly specify the core function of evidence across various domains, from everyday discourse to scientific investigation, as what we may conditionalize on. I argue that knowledge perfectly fits this description, such that one’s total stock of evidence comprises just what one knows.

Here is the plan. §2 argues that all evidence is knowledge, with an accessible description of the relevant technical details. §3 argues that all knowledge is evidence. §4 addresses some objections to the resultant picture that E=K.

2. All Evidence Is Knowledge

The core role of evidence, especially in scientific practice, is to judge the relative plausibility of competing hypotheses, as modelled by conditionalization. But – crucially – conditioning on $ϕ$ fundamentally involves discarding every possibility inconsistent with $ϕ$ . There is something wrong with discarding possibilities which, for all one knows, might be actual; so, one may properly condition only on what one knows. That is, all evidence is knowledge.

To defend the crucial claim in the foregoing argument, we must get slightly technical. Let $W$ be a set; call its elements worlds, and its subsets propositions. A proposition $ϕ$ is true at some world $w$ just in case $w$ is in $ϕ$ . Thus, we’ve identified each proposition, in a coarse-grained way, by the set of worlds at which it’s true. (This much should be familiar from intensional semantics.)

Now, let $Σ$ be a logically closed set of propositions. That is:

$Σ$ contains $⊥ := \emptyset$ (the proposition true at no world).
If $Σ$ contains $ϕ$ , then it contains $\neg ϕ := W ∖ ϕ$ (the proposition true just where $ϕ$ is not true).
If $Σ$ contains $ϕ_{1}, ϕ_{2}, ...$ , then it contains $ϕ_{1} \land ϕ_{2} \land ... := ϕ_{1} \cap ϕ_{2} \cap ...$ (the proposition true just where all of the given propositions are true).

Then $Σ$ is a $σ$ -algebra on $W$ , and $(W, Σ)$ is a measurable space.¹

Now, let $M$ be a function from $Σ$ to the (extended) real numbers such that:

$M (ϕ) \geq 0$ .
If $ϕ_{1} \land ϕ_{2} = ⊥$ , then $M (ϕ_{1} \lor ϕ_{2}) = M (ϕ_{1}) + M (ϕ_{2})$ .²

Then $M$ is a measure on $(W, Σ)$ , and $(W, Σ, M)$ is a measure space. If $M (W) = 1$ , then $M$ is a probability measure, for which we write $P$ .³ In particular, $(W, Σ, P)$ is a probability space, and we call $W$ the sample space and $Σ$ the event space. For any event $ϕ$ in $Σ$ , $P (ϕ)$ is the probability of $ϕ$ .

Next, we define conditional probability. Suppose that we only care about worlds in some particular event $e$ , and so discard all the worlds not in $e$ . This induces a subspace of our probability space. Our sample space naturally becomes $e$ itself. For each event $ϕ$ in $Σ$ , discarding the non- $e$ worlds leaves us with the subset $e \cap ϕ$ . We thus have a restricted event space $Σ_{e} := {e \land ϕ : ϕ \in Σ}$ . But recall that $Σ$ was logically closed, so $Σ_{e}$ is just a subset of $Σ$ . In particular, all the events in $Σ_{e}$ are already in the domain of our probability measure $P$ . Restricting the domain of $P$ to $Σ_{e}$ yields a measure on $(e, Σ_{e})$ ; but we can do better. So long as $P (e) > 0$ , we can define the probability measure $P_{e} := P (ϕ) / P (e)$ for each $ϕ$ in $Σ_{e}$ . Thus, we have a subspace $(e, Σ_{e}, P_{e})$ .

Now, $P_{e}$ was only defined on $(e, Σ_{e})$ ; but we can naturally extend it to a probability measure $P (\cdot ∣ e)$ on $(W, Σ)$ . Recall that every $ϕ$ in $Σ_{e}$ is equivalent by definition to $e \land ϕ$ . Thus, we can define $P (ϕ ∣ e) := P (e \land ϕ) / P (e)$ for each $ϕ$ in $Σ_{e}$ , which uniquely extends $P_{e}$ to a probability measure on $(W, Σ)$ . Now, we see that the definition of conditional probability $P (ϕ ∣ e)$ is not as arbitrary as often presented; rather, it is (the unique extension of) the probability measure induced by restricting our sample space down to some particular (positive probability) event. In particular, we now see that conditioning on some piece of evidence fundamentally involves discarding all the possibilities (formally, worlds) inconsistent with it.

So, the foregoing argument to the effect that all evidence is knowledge goes through. As a reminder, the thought is that one may not discard a possibility which, for all one knows, actually obtains. But the nature of evidence is that one is always in a position to condition on it. So, if something is not known and so may not be conditioned on, then it is not evidence. Contraposing, all evidence is knowledge.

Specifying evidence as what one may condition on automatically yields the result that all evidence is propositional. If something is not a proposition (and so cannot be modelled by an event), then it cannot be conditioned on, and so cannot be evidence in this sense. Of course, one may take less orthodox views of updating, such as ”generalizing” conditionalization to Jeffrey conditionalization; but this adds massive complication to the updating procedure, for no real gain in generality. In particular, we can always enrich the probability space to mimic the effect of Jeffrey conditioning using standard conditioning.⁴

3. All Knowledge Is Evidence

Given the previous section, we have that evidence is a subset of knowledge. Considerations of theoretical economy point against taking evidence as a special, strict subset of knowledge; that E=K looks like the natural default hypothesis. Thus, even without much positive argument for the claim that all knowledge is evidence, we might tentatively adopt it if we accept that all evidence is knowledge.

However, we can offer positive arguments to the effect that all knowledge is evidence. Another way to frame the functional role of evidence is that evidence supports those hypotheses which successfully predict it. Suppose we have two events $h_{1}$ and $h_{2}$ , representing two competing hypotheses. By Bayes’ Theorem,⁵ $P (h_{1} ∣ e) > P (h_{2} ∣ e)$ just in case $P (e ∣ h_{1}) P (h_{1}) > P (e ∣ h_{2}) P (h_{2})$ . So, if the two hypotheses start off equally plausible, we should prefer the one which better predicts our evidence. Now, suppose that two competing hypotheses are equally well-positioned, except that the first better predicts something that we know. Then we have reason to prefer the first hypothesis: given what we know, it is more plausible. This applies for any piece of knowledge. But a hypothesis gains nothing by predicting something which isn’t among our evidence, so that knowledge must be among our evidence. Put another way: if something lends support to hypotheses which predict it, then it thereby counts as evidence. So, what is known must count as evidence. Thus, all knowledge is evidence.⁶

Our original framing of the functional role of evidence also supports the claim that all knowledge is evidence. Just as one may not properly ignore possibilities that might be actual, one may properly ignore possibilities inconsistent with whatever one knows. If one really knows that some possibility does not obtain, then one does no deep wrong in discarding it from consideration.

Of course, there might be practical reasons not to discard possibilities one knows aren’t actual. One can entertain counterfactual situations; one can temporarily set aside some piece of disputed knowledge in order to convince an interlocutor; one can proceed with caution when uncertain about whether one really does know.⁷ But these practical reasons apply equally to evidence. One can entertain scenarios inconsistent with ones evidence, or set aside disputed pieces of evidence, or proceed with caution when something’s evidential status is uncertain. If this is right, then not only do we defuse the objection that there are practical limitations on knowledge, but we also gain a positive argument for equating knowledge and evidence: in particular, the practical limitations on knowledge bear a striking resemblance to the practical limitations on evidence. (Note that a similar response works if one is worried about the evidence-to-knowledge direction: there are some cases where we can go strictly beyond our knowledge for practical reasons, but these are just the cases where we can go strictly beyond our evidence.)

One way of explaining this resemblance, and furthering the case for E=K, is to take individual deliberation as something like an internalization of collective deliberation. That is, individual and collective reasoning are structurally similar, but we tend to think about premises of collective reasoning under the guise of ‘evidence’, and premises of individual reasoning under the guise of ‘knowledge’. But then evidence in collective reasoning constitutes (collective) knowledge, and knowledge in individual reasoning constitutes (individual) evidence.

The move of analogizing individual and collective deliberation has also been employed by Williamson (‘Collective Imagining’, 2024b) on imagination and Kodsi (‘Self-Disagreement’, 2024) on inconsistency. The latter argues that inconsistency may just be the individual version of collective disagreement, such that to be inconsistent is just to disagree with oneself. This further bolsters E=K by suggesting that evidence should consist in knowledge rather than mere belief. If inconsistency is as pervasive (and even healthy) as the analogy with disagreement suggests, then one’s beliefs will often be contradictory, and so conditioning on them will cause a crash. By contrast, since what is known must be true, knowledge cannot be contradictory and so does not face this problem.

4. Objections

Here’s one objection to E=K. By contrast with knowledge, nothing is evidence simpliciter, only evidence for this or evidence for that: if it’s not evidence for anything, then it’s not evidence! (On its own, this objection isn’t worrying: we could complicate our model of evidence, by why should we?) One motivation is that whether something counts as evidence seems to depend on the situation. For instance, perhaps my knowledge that I’ll vote in 2028 just can’t be evidence that I won’t die if I jaywalk across High Street, even though the former entails the latter. On this view, evidence comes and goes with something like the stakes of deliberation.

A first response to this position is to note that, on some subject-sensitive views of knowledge, it comes and goes in exactly the same way. If such a view is true, then the purported asymmetry disappears. A second response is to just deny that evidence comes and goes. If I do know that I’ll vote in 2028 next summer, then I do in fact have evidence guaranteeing that I won’t die before then, but this evidence just isn’t admissible. We have independent reason to think that, when addressing is a high-stakes worry, it would be prudent to only be reassured by my most secure evidence, which I can be sure that I have. Similarly, it would be prudent to rely only on knowledge which I know that I have. This explains why I shouldn’t use my voting in 2028 as a premise in reasoning, even if I have it among my total stock of evidence. So, the proposed complication of evidence is idle: what it seeks to explain is already adequately explained.

A second objection to E=K is that it trivializes evidence: all knowledge becomes self-justifying, in the sense that one always has perfect evidence guaranteeing what one knows, simply in virtue of knowing it. But, surely, I don’t have perfect evidence for everything I know.

One response is to appeal again to the ideology of admissible evidence. Rules of discourse usually require adducing different evidence for the claim under discussion, rather than blandly repeating the claim. So, the self-justification of knowledge only looks bad because it would be bad form in discourse, not because there is anything deeply objectionable. Something can be truth but dialectically ineffective: for instance, if my opponent doesn’t believe in rational argument, I can’t use rational argument against her without thereby begging the question. Another (compatible) response is to note that some knowledge is self-justifying after all. As is well-known (at least among philosophers), Hesperus the evening star just is Phosphorus the morning star – both ‘Hesperus’ and ‘Phosphorus’ are names for the planet Venus. The fact expressed by ‘Hesperus appears in the evening, and Hesperus is Phosphorus’ is perfectly good evidence for the fact expressed by ‘Phosphorus appears in the evening’, even though – assuming intensionalism – these two [sic] facts are the same.

A third objection points to the acceptability of sentences like the following:

I don’t know whether Claude stole it, but the evidence is that he did.
I don’t have evidence that Claude stole it, but I know that he did.

If we take these at face value, the first suggests that some evidence isn’t knowledge, while the second suggests that some knowledge isn’t evidence.

But for the first case, we should note that there’s something suspect about the phrase ‘the evidence is that’. Granted, my claim being that $p$ requires that $p$ is part of my claim, not merely that $p$ is supported by my claim. But the evidence being that $p$ does not seem to require the same; in particular, ‘the evidence is that $p$ ’ seems closer to ‘the evidence suggests that $p$ ’ than to ‘that $p$ is among the evidence’ – certainly, ‘I don’t know whether Claude stole it, but that he did is among the evidence’ sounds much worse. For the second case, we must appeal to something like the response to the first objection: it only looks like one has no evidence because adducing such trivial evidence would be unacceptable. That is, ‘evidence’ in (2) must be elliptical for something like ‘independent evidence’.

The fourth objection, and the final one I’ll consider here, takes evidence to be something like non-inferential knowledge. To avoid skepticism, we should grant that knowledge can be gained from ampliative inference. After seeing enough red balls drawn from some urn, one might ‘take the plunge’ and come to know that all the balls in the urn are red. After seeing a disproportionate number of Heads, one might ‘take the plunge’ and come to know that a coin is biased. But if such inferential knowledge can further serve as evidence in its own right, then we might allow a suspicious sort of bootstrapping. On the previously sketched picture of conditionalization, it looks like choosing some events with sufficiently high conditional probability, and throwing out the few cases where they don’t obtain. This looks bad enough already. But if we keep doing this, eventually the small risks of error may add up to a large risk of error. Yet, since our base of evidence keeps growing as we gain more knowledge, it looks like the later inferences are actually more secure than our earlier inferences. This result seems troubling; it might be better to restrict evidence to non-inferential knowledge.

Here is a multi-part response. The first thing to note is that ‘taking the plunge’ does not amount to simply ruling out events with negligibly low chance; rather, it involves ruling out events with negligibly close chance. Thus, error-risk does not accumulate in the suspicious way. Of course, all else equal, things with low chance are likelier to not appear with any close chance (as higher chances means more ways to appear in close chances), so there is a connection between no-high-chance probabilistic safety and no-close-chance modal safety. Spelling out the latter more precisely is somewhat difficult,⁸ but the objector also faces the challenge of spelling out what knowledge counts as too inferential to be evidence. Depending on how this is done, the objector may end up ruling out much of our knowledge, leaving us with very little evidence to work with: perhaps the vast majority of our knowledge is inferential in some sense. Further, although this objection sounds worrying in the abstract, it’s hard to see any really difficult cases. We might expect that it will often look either like one really does have inferential knowledge, in which case it will be fine to treat such knowledge as evidence; or else like the inference is too shaky for the conclusion to be known in the first place. Speaking abstractly allows the objector to illicitly run these two possibilities together, giving the false impression that there are cases where some inference is solid enough to count as knowledge but too shaky to serve as evidence. Finally, we have the fallback move of treating some evidence as inadmissible for justifying further inference, thereby blocking any suspicious bootstrapping. The upshot is that we already have many resources to explain away this worry, such that it’s unnecessary to make sharp changes to our theory in order to vindicate it.

5. Coda

Given the operationalization of evidence assumed above, I’ve effectively argued for the following (synchronic) norm: consider all and only those possibilities which, given everything you know, may actually be the case. Among the possibilities under consideration, some outcomes will be more common, and others will be less common. Given a background (‘ur-prior’) probability measure over a set of “all” possibilities, considering a particular subset naturally induces a probability measure on that subset. Credence stands to this evidentially-induced probability as belief stands to knowledge. (Compare Knowledge and Its Limits, Ch. 10.)

We get the diachronic norm of updating beliefs via conditionalization as a special case, when one experiences a monotonic increase in knowledge. If one forgets something (perhaps by falling asleep, or being force-fed a memory pill – nothing irrational!), then one’s rational credences are not updated by conditionalization. In particular, one can – entirely rationally – go from having full credence in something to having less-than-full credence in it. (Incidentally, this shows that there’s nothing scary, permanent, or overconfident about full credence.) One may compare the resulting picture to Hedden’s ‘Time-Slice Rationality’ (2015), although the motivation differs. Such convergence from disparate considerations is good news.

One may verify that our logical notation corresponds nicely to the set-theoretic operations; for instance, $\lor$ (definable with $\neg$ and $\land$ ) must be set-theoretic union. Those with interests in higher-order metaphysics may verify that this results in a (not generally atomic) Boolean algebra. ↩
This ensures finite additivity, but we usually want countable additivity too: if a countable family of propositions is pairwise disjoint (the conjunction of any two is a contradiction), then the measure of their union is the sum of their measures. See also Jaynes’s Probability Theory, Ch. 15 (contra de Finetti). ↩
If $0 < M (W) < \infty$ , we can just define a probability measure $M^{'}$ as $M^{'} (ϕ) = M (ϕ) / M (W)$ . ↩
See also Jaynes’s Probability Theory, Ch. 5 (2002). ↩
By definition, $P (e ∣ h) = P (h \land e) / P (h)$ , so $P (e \land h) = P (h \land e) = P (e ∣ h) P (h)$ . Substituting into $P (h ∣ e) = P (e \land h) / P (e)$ , we have $P (h ∣ e) = P (e ∣ h) P (h) / P (e)$ , which is Bayes’ Theorem. ↩
This framing works less well in establishing that all evidence counts as knowledge, since it doesn’t seem obvious that hypotheses only gain support by predicting things that we know (as opposed to, say, things which we reasonably believe). ↩
Of course, being too cautious might mean failing to retain one’s outright belief and so losing one’s knowledge. But see Salow’s ‘Iterated Knowledge Isn’t Better Knowledge’ (in prep.) for concerns about this type of explanation. ↩
Though see Williamson’s ‘Probability and Danger’ (2009). ↩

🪴 Oak Hu

Explorer

W4: Williamson II

Does knowledge, and only knowledge, constitute evidence?

1. Introduction

2. All Evidence Is Knowledge

3. All Knowledge Is Evidence

4. Objections

5. Coda

Table of Contents

Explorer

Backlinks

🪴 Oak Hu

Explorer

W4: Williamson II

Does knowledge, and only knowledge, constitute evidence?

1. Introduction

2. All Evidence Is Knowledge

3. All Knowledge Is Evidence

4. Objections

5. Coda

Footnotes

Table of Contents

Explorer

Backlinks