icon nav

Bayesianism and the Philosophy of Science

Probabilistic Statements
  • Epistemology
  • Philosophy of Science
  • History of Philosophy
  • Philosophy

The Bayesian approach considers probability statements as subjective, asserting that they represent one’s degrees of belief. It does not impose that one’s degrees of belief should align with frequencies. What Bayesianism requires is that “a coherent set of degrees of beliefs has to follow the standard rules of the mathematics of probability” (Godfrey-Smith, 206). The “Dutch book argument” or “sure-loss contract” is meant to show the importance of such probabilistic coherence (Ibid, 208 & Staley, 114). A Dutch book shows that particular degrees of belief, if put into practice, would ensure loss. For instance, if, say, Mr. X, assigns a .6 probability to the proposition that it will snow today and a .6 probability to the proposition that it will not snow today, Mr. X is not straightforwardly contradicting himself. But the problem arises when Mr. X realizes that he should be willing to pay $6 for a bet that pays $10 if it snows, and that he should be willing to pay $6 for a bet that pays $10 if it does not snow. At the end of the day, whether it snows or not, he will have spent $12 and gotten back only $10. It appears to be failing of rationality if acting on one’s beliefs would cause one to lose money no matter how the world goes. It can be shown, mathematically, that if one’s degrees of belief obey the probability calculus, no Dutch book can be made against this person.

Yet insane sets of belief can still be probabilistically coherent. Bayesianism becomes a scientific theory of scientific rationality by developing a theory of how one should handle evidence. The first aspect of the theory is a notion of confirmation as raising the probability of a hypothesis. The Bayesian approach regards the notion of confirmation as inherently quantitative. Which is to say that one cannot ask whether a piece of evidence, E, confirms a hypothesis, H, unless one knows how probable H started out being: one needs to have a “prior probability” for H. E confirms H just in case E raises the prior probability of H. This means that the probability of H given E is higher than the probability of H had been: P(H/E) > P(H). E disconfirms H if P(H/E) < P(H).

            This all proceeds within a subjective interpretation of probability. For instance, a large cloud on clear horizon counts as evidence of snow for me just in case my subjective probability that it will snow, given the new information that there is a large cloud on the horizon, is higher than my prior probability that it would snow. This idea makes tacit use of the notion of conditional probability: the probability of the hypothesis conditional on or given the evidence. The conditional probability of H given E is the probability of 
(H&E) divided by the probability of E (provided that E has a nonzero probability). (H&E) is the intersection, the overlap, of cloudy days and snowy days. The definition says that the higher the percentage of cloudy days that are snowy, the higher the conditional probability of H given E. If I was already convinced that it would snow (say, for other reason like a report), then this high conditional probability of snow depending on clouds would not change my prior belief and, thus, would not be evidence. But, if I had been relatively neutral, it might significantly confirm the snow hypothesis for me. (The idea that whatever raises the probability of H confirms H has problems. For instance, my seeing Angelina Jolie on the street might raise the probability that she and I will make a movie together, but it hardly seems to count as evidence that we will in fact make that movie).

The second aspect crucial to Bayesianism is that beliefs should be updated in accordance with Bayes’s Theorem. The theorem itself is a straightforward consequence of the definition of conditional probability. Non-Bayesians, arguably, would accept the truth of the theorem but they would not put it to the same use as Bayesians. The statement of the theorem is as follows:

P(H/E)= P(E/H)×P(H) . P(E)

 

The left side of the statement is the conditional probability of the hypothesis given the evidence. It can have two different readings, depending on whether the evidence is available yet or not. If the evidence is not available, then P(H/E) is the prior conditional 
probability of H given E. For instance, a physicist in 1915 might have assigned a low probability to Einstein’s hypothesis of general relativity, but he also might have thought that if it turns out that light rays are bent by the Sun, he will assign a quite high probability to Einstein’s hypothesis. If the evidence is available, then P(H/E) represents the posterior probability of the hypothesis. It is the probability this same 1915 physicist now assigns to Einstein’s hypothesis, once he has gotten news that light rays are bent.

            Let us now turn to the right side of the statement. P(E/H) measures how unsurprising the evidence is given the hypothesis (which seems like a Popperean virtue). So, given Einstein’s hypothesis of general relativity, the probability that light rays are bent by the Sun’s gravitational field is quite high. P(H) is just the prior probability of the hypothesis. The posterior probability (that is, the left side of the equation) 
is directly proportional to the prior probability of the hypothesis and directly proportional to the extent to which the hypothesis makes evidence unsurprising. The prior probability of the evidence is the denominator of the fraction, reflecting the fact that, all other things being equal, unexpected evidence raises posterior probabilities a lot more than expected evidence does. Apart from Einstein’s theory, the probability of light being bent by the Sun was quite low. It is because Einstein’s prediction is so unexpected, except in light of Einstein’s theory, that the evidence had so much power to confirm the theory. Thus, the more unexpected a given bit of evidence is apart from a given hypothesis and the more expected it is according to the hypothesis, the more confirmation the evidence confers on the hypothesis.

            More controversially, Bayesians propose as a rule of rationality that, once the evidence comes in, the agent’s posterior probability for H given E should equal the agent’s prior conditional probability for H given E. This sounds uncontroversial, but the mathematics by itself will not yield this result. Once the evidence comes in, one could maintain probabilistic coherence by altering some of the other subjective probabilities, namely, some of the numbers on the right side. One could decide that the evidence was not that surprising after all, for instance, thereby making the posterior probability at hand different from the prior conditional probability. Why would t today’s priors be tomorrow’s posteriors? (Godfrey-Smith, 206). Here Bayesians can appeal to a diachronic Dutch book argument to support this requirement: if one uses any rule other than Bayesian conditionalization to update one’s beliefs, then a bookie who knows that person’s method can use it against him or her by offering a series of bets, some of which depend on that person’s future degrees of belief.

Regarding evidence and justification in scientific reasoning, the Bayesian approach allows for impressive subjectivity (there is very little limitations on prior probabilities other than coherence with other degrees of belief) and impressive objectivity (there is one correct way of updating one’s beliefs in the face of new evidence). Bayesians argue that initial subjectivity disappears when enough good evidence comes in, which Staley and Godfrey-Smith refer to as “convergence” - or the “washing out of prior probabilities” (Staley, 133 & Godfrey-Smith, 209). The idea is that it can be established that no matter how great the disagreement is between two people, there is some amount of evidence that will bring their posterior probabilities as close together as we would like.

            That seems promising, but it is subject to some significant limitations. First, if one person assigns a prior probability of 0 to a hypothesis, no evidence will ever increase that probability; it’s in the numerator of the fraction – multiplying by zero gets 0. But more importantly, there is no assurance that convergence will happen in a reasonable amount of time. And lastly, the washing-out results require that the agents agree about the probabilities of all the various pieces of evidence given the hypothesis in question. This seems vey problematic. How can we impose such a requirement of agreement on two people who disagree so much? This would suggest, unrealistically, that people could really disagree about the probability of hypotheses while agreeing about the probability of evidence bearing in particular ways on the hypotheses.