By Tom Herzog

Sharon Bertsch McGrayne has devoted seven and a half years to writing *The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines & Emerged Triumphant From Two Centuries of Controversy*.

All of this painstaking work shows in what we regard as both a masterpiece and an invaluable contribution to the scientific literature, in general, and the actuarial literature in particular. The book is a treasure trove of citations to interesting items in the literature and biographical information on key figures in Bayesian history that we found extremely difficult to find prior to the arrival of this book.

The Role of Actuaries in the Application of Bayes’ Theorem

The thing that we found most amazing was the central part that actuaries played in this history. Most, as one would expect, were involved with practical applications of Bayesian statistics to real-life insurance problems. However, at least one—Bruno de Finetti—was deeply involved with foundational issues as well.

Whitney

In Chapter three, the reader is introduced to Albert Whitney who taught courses in probability for insurance professionals at the University of California at Berkeley. Whitney was well-versed in Bayesian statistics. Of course, at that time, what we now call Bayesian statistics was called “inverse probabilities.” During 1918, Whitney was on a Casualty Actuarial Society committee whose task was to devise a scheme to set workers’ compensation rates based upon a “sound mathematical foundation.” Whitney realized that the correct “equations were too complicated for the fledgling workers’ compensation movement.” “One afternoon in the spring of 1918, during World War I, Whitney and his committee worked for hours stripping away every mathematical complication and substituting dubious simplifications.” The result was the credibility formula

Z=P/(P+K).

Thus, Whitney’s committee had produced “a stunningly simple formula that a clerk could compute [of course, this was well-before computers had been invented], an underwriter could understand, and a salesman could explain to his customers.” The effect of this, as every actuary knows, is to pull all of the estimates of the premium rates closer to their overall mean. This thereby produces the “shrinkage effect” that the statistician Charles Stein rediscovered in the 1950s.

Bailey and Bayes

Arthur Bailey studied actuarial science at the University of Michigan from which he graduated in 1928. At Michigan he studied classical/frequentist statistics. In 1937 he began working for the American Mutual Alliance, a consortium of mutual insurance companies, where “he was in charge of setting premium rates to cover risks involving automobiles, aircraft, manufacturing, burglary, and theft.” Bailey’s first impression was “horror” when he realized that the companies’ underwriters were using Whitney’s “sledge-hammer” scheme that approximated a Bayesian approach. “As a modern statistical sophisticate, Bailey was scandalized.” During his first year at American Mutual, Bailey attempted to convince himself that the casualty actuaries’ Bayesian scheme was “mathematically unsound.” “After a year of intense mental struggle, Bailey realized to his consternation that actuarial sledgehammering worked. He even preferred it to the elegance of frequentism.” Bailey “realized that the hard-shelled underwriters were recognizing certain facts of life neglected by the” frequentist statisticians.

Bailey summarized his findings in a paper he presented at a CAS meeting on May 22, 1950. “First, he praised his colleagues for standing almost alone against the statistical establishment and for staging the only organized revolt against the frequentists’ sampling philosophy.” He went on to say that actuaries “marched ‘a step ahead’” of those in most other fields. “Then he announced the startling news that their beloved Credibility formula was derived from Bayes’ theorem.” He went on to mount a “frontal attack on frequentists” in general and their leader, R.A. Fisher in particular. Bailey “concluded with a rousing call to reinstate prior knowledge in statistical theory.”

Bailey, Longley-Cook and Rare Events

Bailey “wanted to give more weight to a large volume of data than to the frequentists’ small sample; doing so felt surprisingly ‘logical and reasonable.’ He concluded that only a ‘suicidal’ actuary would use Fisher’s method of maximum likelihood [estimation] which assigned a zero probability to non-events. Since many businesses file no insurance claims at all, Fisher’s method would produce premiums too low to cover future losses.”

Unfortunately, Bailey died of a heart attack on Aug. 12, 1954, at age 49.

Fortunately, there were at least a few practicing actuaries who understood his message and were willing to apply it to real-world problems. A few months after Bailey’s death, the CEO of the Insurance Company of North America (INA) asked his chief actuary, L. H. Longley-Cook, if anyone “could predict the probability of two planes colliding in midair.” Since there had been no previous midair airplane crashes, frequentist methods were of no use. After considering this problem for several weeks, Longley-Cook responded that he expected anywhere from zero to four such crashes during the next ten years. He recommended that INA “should prepare for a costly catastrophe by raising premium rates for air carriers and purchasing reinsurance.” Indeed, on June 30, 1956, a United Airlines DC-7 Mainliner and a TWA Lockheed L-1049 Super Constellation collided over the Grand Canyon in Arizona, killing all 128 people aboard the two planes. Then, four years later, on December 16, 1960, a United Airlines DC-8 Mainliner and a TWA Lockheed L-1049 Super Constellation collided over New York City killing all 128 people aboard the two planes as well as six people on the ground.

De Finnetti

Bruno de Finetti was an Italian actuary and professor of mathematics. He worked for a number of years at the Italian insurance company Assicurazioni Genrali located in Trieste, Italy. McGrayne credits de Finnetti with putting the use of a subjective prior in Bayes’ theorem “on a firm mathematical foundation.” De Finetti is also well-known because of his theorem on exchangeable sequences of random variables.

In 1963, de Finnetti hosted a meeting of Bayesians at Trieste, Italy. It was at that meeting that both Dennis Lindley (a famous English Bayesian statistician) and Hans Buhlmann (a well-known Swiss actuary) happily learned about Bailey’s work.

Differences in Results Using Bayes and Freq. Approaches

Estimation with Large Amounts of Data

With large samples the influence of the prior is usual minimal and so the results of Bayesian methods and maximum likelihood estimation are usually close to each other.

Testing Hypotheses Using Classical Methods with Voluminous Data

George Box has written that “All models are wrong, but some are useful.” A corollary of this is that if a frequentist’s model is slightly off and he has voluminous amounts of data, he will almost always reject his null hypothesis. This is why frequentists sometimes wish for less data rather than more. To us this seems perverse. Why operate under a statistical paradigm in which you want to live in an austere world where data are scarce?

Estimating Probabilities without Prior Observations

As we mentioned in the section on Bailey and Longley-Cook, Bayesian methods are the only ones that offer solutions when there are no observations. The example cited earlier involved airplane collisions. Chapter 9 treats “the probability that a thermonuclear bomb might explode by mistake.” Another similar situation that McGrayne raised after the publication of her book is computing the probability that Osama bin Laden was residing in a particular house in Abbottabad.

Dealing with Limited Amounts of Data—Posterior Odds Ratios – An Approach When Data Are Not Credible

When conducting clinical trials, drug companies frequently have to deal with small samples because the cost of each individual participant is high. With such limited data, the optimal scheme usually entails the use of Bayesian posterior odds. A well-known clinical trial was the 2000 study in which Merck compared the side effects of its non-steroidal anti-inflammatory drug, Vioxx, to those of naproxen—the generic name of a competing non-steroidal anti-inflammatory drug produced under a variety of brand names such as Aleve. In this study, eight of those treated with Vioxx suffered heart attacks versus only one in the control group treated with naproxen. Several researchers castigated Merck for claiming that because of the lack of statistical significance at the five percent level, there was no difference in the effects of the two drugs. Ultimately, Merck was compelled to pay billions of dollars in claims to the victims and families of those who had serious adverse effects from Vioxx.

It is expensive to collect data. Why disregard any available data? It does not make sense to say that “data are not credible.” Even with scarce data the actuary can still compute posterior odds. Under some conditions, it may be possible to compute posterior odds by ignoring prior probabilities. This eliminates the dreaded subjectivity of the Bayesian approach.

The Case against Ever Saying “Data Are Fully Credible”

Bayesians “include every datum possible because each one might change the answer by a small amount.” (See Page 50) This is a strong argument against ever giving full credibility of 100 percent under the classical approach. This approach has done tremendous harm to the actuarial professional. It has actuaries asking the wrong questions and, as a consequence, solving the wrong problems. An example of this is in the approach to risk-based capital known as C-3 Phase 3. Here, the reserving actuary is instructed to “use classical credibility.” A Bayesian approach, or even maximum likelihood estimation, would seem to be preferable.

Solving Real Problems

From page 209 of book: “In 1979 NATO held a symposium in Portugal to encourage the solution of ‘real problems’ with Bayesian methods.” “Among the civilian attendees was Ray Hilborn, a newly minted zoology Ph.D. who was interested in saving the fish populations in the world’s oceans.” His overview of the symposium was that “Everyone who is actually interested in solving problems does things in a Bayesian way. The limit of [frequentist] approaches just isn’t obvious until you actually have to make some decisions. You have to be able to ask, ‘What are the alternative states of nature, and how much do I believe they’re true?’ [Frequentists] can’t ask that question. Bayesians, on the other hand, can compare hypotheses.”

Making Decisions under Uncertainty (Pages 145-146)

“Statisticians like [Howard] Raiffa and [Robert] Schlaifer [of the Harvard Business School] were increasingly interested in using [statistical methods] not just to analyze data but to make decisions. In contrast, Neyman and Pearson considered the errors associated with various strategies or hypotheses and then decided whether to [reject] them; they could not infer what to do based on observed sample outcomes without thinking about all of the potential sample outcomes that could have occurred but had not. This was [the English Bayesian Harold] Jeffrey’s objection to using frequentism for scientific inference. Raiffa felt the same way for different reasons; he wanted to make decisions tied to ‘real economic problems; not phony ones.’” The idea was to make decisions based on the probability of future outcomes. Raiffa wanted his students to construct entire probability distributions of future outcomes. (These are what we now call “predictive distributions.”) Raiffa felt that the concept of hypothesis testing was “leading students in the wrong direction.”

Argument for Subjective Prior Distributions (Page 79)

We have previously alluded to de Finetti’s justification for a subjective prior distribution. In her book, McGrayne argues in favor of a subjective prior by noting that: “No rational prospector would search for mineral deposits unless a geological study, or the experience of previous prospectors, showed a sufficiently high probability of their presence.” “Police will patrol localities of high incidence of crime. Public health officials will have ideas in advance of the likely sources of infection and will examine them first.”

Other noted researchers continue to argue against subjective priors. During the 1950s, Professor Herbert Robbins of Columbia University proposed a compromise between Bayesian and anti-Bayesian approaches. He suggested first using observed relative frequencies to estimate prior probabilities and then applying Bayes’ theorem. This approach was called “Empirical Bayes.” In a 1999 article, the Swiss actuary Hans Buhlmann was quoted as follows, putting him squarely in the Empirical Bayes camp:

"Whereas early Bayesian statisticians used the prior distribution of risk parameters as a means to express judgment (which in insurance we would call underwriting judgment), [I] think of the probability distribution of the risk parameters as having an objective meaning. Hence, it needs to be extracted from data gathered about the collective. (Only in the case of a lack of such data might one accept the subjective view faute de mieux.) For this reason, I have always insisted on speaking about the structural distribution of risk parameters, avoiding the standard Bayesian terminology, ‘prior distribution’.”

Much actuarial work is concerned with the revision of insurance premium rates. This is an example of where Bayes’ theorem should work fine and where there is typically ample data to incorporate into one’s prior distribution.

Statisticians as Members of Religious Sects

McGrayne concludes her book with a wonderful appendix in which she discusses “religious metaphors in statistics.”

She classifies “frequentists as ‘metaphorical Catholics’ [who divide] results into ‘significant’ and ‘non-significant’ instead of dividing sin into ‘mortal’ (i.e., significant) and venial. Randomization is the grace that saves the world.”

“On the other hand Bayesians are born-again fundamentalists. One must be a ‘believer’ and Bayesians can pinpoint the day when Bayes came into their lives, when they dropped their childish frequentist ways.”

CLOSING

We close with another religious story. When the well-known Bayesian statistician Dennis Lindley joined the faculty at University College London a colleague remarked that it was “as though a Jehovah’s Witness had been elected Pope.”

*Thomas N Herzog, PhD, ASA, MAAA, is a retired actuary in Reston, Va. He can be reached at ResBayes@aol.com.*